# Import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
# Load the dataset from Colab
from google.colab import files
uploaded = files.upload()
Saving multi_method_importance.csv to multi_method_importance.csv
# Read the dataset
df = pd.read_csv('Health Dataset5.csv')
train_all = pd.read_csv('train_all.csv')
test_all = pd.read_csv('test_all.csv')
train_df_transformed = pd.read_csv('train_df_transformed.csv')
test_df_transformed = pd.read_csv('test_df_transformed.csv')
df_transformed = pd.read_csv('df_transformed.csv')
df_combined_with_country = pd.read_csv('df_combined_with_country (1).csv')
df_lagged = pd.read_csv('df_lagged.csv')
df_combined = pd.read_csv('df_combined (1).csv')
multi_method_importance = pd.read_csv('multi_method_importance.csv')
df_model_comparison = pd.read_csv('df_model_comparison.csv')
QQ Plot of Residuals, Residuals vs. Fitted Values Plot¶
This plot helps check for the assumptions of linearity and constant variance for a linear regression model.
If curved residual patterns, it indicates that the relationship between predictors and the target is not linear that a linear model may be inappropriate.
The funnel shapes increasing or decreasing spread, which means the variance of the residuals is not constant across all fitted values. This violates one of the key assumptions of linear regression and can lead to inefficient and biased estimates
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy import stats
# List of predictors (make sure column names match exactly in your dataframe)
features = [
'Income', 'GDP', 'CPI', 'Sex ratio',
'BMI (female)', 'Cost of a healthy diet', 'Inflation',
'Incomplete tertiary education', 'Gini coefficient', 'Median age'
]
# Loop through each target variable
for target in ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']:
print(f"\nModeling for: {target}")
# Subset and drop rows with missing values
model_data = df[[target] + features].dropna()
X = model_data[features]
y = model_data[target]
# Add constant (intercept)
X = sm.add_constant(X)
# Fit OLS regression model
model = sm.OLS(y, X).fit()
residuals = model.resid
# --- QQ Plot ---
plt.figure(figsize=(6, 4))
stats.probplot(residuals, dist="norm", plot=plt)
plt.title(f'QQ Plot of Residuals - {target}')
plt.grid(True)
plt.show()
# --- Residuals vs. Fitted Values Plot ---
plt.figure(figsize=(6, 4))
plt.scatter(model.fittedvalues, residuals, alpha=0.5)
plt.axhline(0, color='red', linestyle='--')
plt.title(f'Residuals vs Fitted - {target}')
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.grid(True)
plt.show()
# --- Residual Summary ---
print("Residuals Summary:")
print(f" Mean: {residuals.mean():.4f}")
print(f" Std Dev: {residuals.std():.4f}")
print(f" Skewness: {residuals.skew():.4f}")
print(f" Kurtosis: {residuals.kurtosis():.4f}")
# --- Shapiro-Wilk Test for Normality ---
shapiro_test = stats.shapiro(residuals)
print(f" Shapiro-Wilk: Statistic={shapiro_test.statistic:.4f}, p-value={shapiro_test.pvalue:.4f}")
if shapiro_test.pvalue > 0.05:
print(" Residuals are approximately normal.")
else:
print(" Residuals deviate from normality.")
Modeling for: Life expectancy
Residuals Summary: Mean: -0.0000 Std Dev: 8.2276 Skewness: -1.0599 Kurtosis: 1.9206 Shapiro-Wilk: Statistic=0.9420, p-value=0.0000 Residuals deviate from normality. Modeling for: Cardiovascular diseases
/usr/local/lib/python3.11/dist-packages/scipy/stats/_axis_nan_policy.py:586: UserWarning: scipy.stats.shapiro: For N > 5000, computed p-value may not be accurate. Current N is 17504. res = hypotest_fun_out(*samples, **kwds)
Residuals Summary: Mean: 0.0001 Std Dev: 120.4087 Skewness: 4.6438 Kurtosis: 59.1114 Shapiro-Wilk: Statistic=0.3134, p-value=0.0000 Residuals deviate from normality. Modeling for: Diabetes
/usr/local/lib/python3.11/dist-packages/scipy/stats/_axis_nan_policy.py:586: UserWarning: scipy.stats.shapiro: For N > 5000, computed p-value may not be accurate. Current N is 17504. res = hypotest_fun_out(*samples, **kwds)
Residuals Summary: Mean: -0.0000 Std Dev: 3.2707 Skewness: 1.4892 Kurtosis: 5.4749 Shapiro-Wilk: Statistic=0.8798, p-value=0.0000 Residuals deviate from normality.
/usr/local/lib/python3.11/dist-packages/scipy/stats/_axis_nan_policy.py:586: UserWarning: scipy.stats.shapiro: For N > 5000, computed p-value may not be accurate. Current N is 17504. res = hypotest_fun_out(*samples, **kwds)
The results of the QQ plot and Residual vs Fitted value:
Life Expectancy The residuals for the life expectancy model have a near-zero mean, which is good. However, they exhibit moderate left skew (skewness = -1.059) and slightly lower-than-normal kurtosis (1.92), suggesting they are not perfectly normally distributed. The Shapiro-Wilk test confirms this, with a p-value of 0.0000 indicating a significant deviation from normality. The QQ plot likely shows curved tails, and if the residuals vs. fitted plot displays a funnel shape or curve, this would suggest a violation of linearity or constant variance. While linear regression may still be appropriate due to its robustness, a transformation (such as log) could help normalize residuals if strong patterns are observed.
Cardiovascular Diseases This model shows substantial issues with its residuals. The residual mean is 5.2 (ideally it should be closer to 0), and the skewness is very high (4.64), indicating extreme right-skew. The kurtosis value of 59.11 is also very large, pointing to heavy tails and likely outliers. With a Shapiro-Wilk p-value of 0.0000, the residuals strongly violate the assumption of normality. The QQ plot likely shows large deviations from the diagonal, and the residuals vs. fitted plot probably reveals non-random patterns and uneven spread. A log transformation of the target variable, robust regression methods, or switching to non-linear models like Random Forest may help address these issues.
Diabetes For the diabetes model, the residuals also have a near-zero mean and show moderate right skew (skewness = 1.489) with heavier tails than normal (kurtosis = 5.47). Though not extreme, the Shapiro-Wilk test still reports a p-value of 0.0000, suggesting the residuals are not normally distributed. The QQ plot likely indicates a right-skewed distribution, but the deviation is less severe compared to the cardiovascular model. If the residuals vs. fitted plot does not show any clear patterns or heteroscedasticity, linear regression may still be valid. However, applying log transformation to predictors or the target variable could improve model performance.
Histogram and KDE Plot¶
Histogram and KDE Plot are used to visualize the normalization for each variable
# Histogram and Skewness Summary
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Replace this with your actual DataFrame
# df = pd.read_csv('your_dataset.csv')
# Identify numeric columns
numeric_cols = df.select_dtypes(include='number').columns
# Calculate skewness
skewness_summary = df[numeric_cols].skew().sort_values(ascending=False)
print("Skewness Summary:")
print(skewness_summary)
# Plot histogram and KDE for each numeric column
for col in numeric_cols:
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
sns.histplot(df[col].dropna(), bins=30, kde=False)
plt.title(f'Histogram of {col}')
plt.subplot(1, 2, 2)
sns.kdeplot(df[col].dropna(), shade=True)
plt.title(f'KDE Plot of {col}')
plt.tight_layout()
plt.show()
Skewness Summary: Inflation 75.489967 CPI 25.637506 Cardiovascular diseases 10.419131 GDP 8.488527 Sex ratio 7.718123 Diabetes 1.823593 Income 1.618565 Unemployment Rate 1.487679 Child mortality rate 1.458613 Incomplete tertiary education 1.154120 Median age 0.899402 Gini coefficient 0.820521 Cost of a healthy diet 0.642813 BMI (female) 0.257175 BMI (male) 0.065172 Year -0.002662 Life expectancy -0.691259 dtype: float64
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
/tmp/ipython-input-7-1776002204.py:27: FutureWarning: `shade` is now deprecated in favor of `fill`; setting `fill=True`. This will become an error in seaborn v0.14.0; please update your code. sns.kdeplot(df[col].dropna(), shade=True)
Boxplot¶
Boxplots is a good tool of offering a visual summary of the distribution, skewness, and variability for each numeric variable in the dataset.
# Boxplot
import seaborn as sns
import matplotlib.pyplot as plt
# Loop through all numeric columns to create boxplots
for col in df.select_dtypes(include='number').columns:
# Get the data for the current numeric column
column_data = df[col].dropna() # Drop NaN values to avoid potential issues with plotting
# Check if there is enough data for plotting (at least one non-null value)
if len(column_data) > 0:
sns.boxplot(x=column_data)
plt.title(f"Boxplot of {col}")
plt.xlabel(col)
plt.show()
else:
print(f"Not enough data to generate boxplot for column: {col}")
Spearman Correlation¶
# Spearman Correlation matrix and heatmap
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np # Import numpy for np.number
# Compute correlation matrix - Select only numeric columns
corr_method = 'spearman'
# Select only numeric columns for correlation calculation
df_numeric = df.select_dtypes(include=np.number)
corr_matrix = df_numeric.corr(method=corr_method)
# Plot heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(corr_matrix, annot=True, fmt=".2f", cmap='coolwarm', square=True)
plt.title(f'{corr_method.capitalize()} Correlation Heatmap')
plt.show()
Feature Selection Comparison (Summary and Charts)¶
Compare Feature Selection method and the best nubmer of features using RMSE
# feature selection comparison
from sklearn.linear_model import Ridge
from sklearn.impute import SimpleImputer
def find_best_feature_count(X_df, y, max_features=None):
import numpy as np
import pandas as pd
from sklearn.linear_model import LassoCV, LinearRegression
from sklearn.feature_selection import RFE, SequentialFeatureSelector
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
feature_names = X_df.columns.tolist()
# 1. Impute missing values
imputer = SimpleImputer(strategy='mean') # or 'median', 'most_frequent'
X_imputed = imputer.fit_transform(X_df)
y_imputed = imputer.fit_transform(y.values.reshape(-1, 1)).ravel()
# --- Scale X and y ---
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_imputed)
y_imputed = y_imputed.reshape(-1, 1)
y_scaler = StandardScaler()
y_scaled = y_scaler.fit_transform(y_imputed).ravel()
y_original = y_imputed.ravel()
#y_scaled = y_scaler.fit_transform(y).ravel()
#y_original = y.ravel()
tscv = TimeSeriesSplit(n_splits=3)
def rmse_on_original_scale(model, X_subset):
y_preds, y_tests = [], []
for train_idx, test_idx in tscv.split(X_subset):
model.fit(X_subset[train_idx], y_scaled[train_idx])
y_pred_scaled = model.predict(X_subset[test_idx])
y_pred_original = y_scaler.inverse_transform(y_pred_scaled.reshape(-1, 1)).ravel()
y_preds.extend(y_pred_original)
y_tests.extend(y_original[test_idx])
return np.sqrt(mean_squared_error(y_tests, y_preds))
# --- Feature Selection ---
max_features = min(max_features or 20, X_scaled.shape[1] - 1)
lasso = LassoCV(cv=tscv, random_state=42).fit(X_scaled, y_scaled)
lasso_coef = lasso.coef_
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_scaled, y_scaled)
importances = rf_model.feature_importances_
lasso_rmse_list, rfe_rmse_list, sfs_rmse_list, rf_rmse_list = [], [], [], []
step = 2
for n in range(1, max_features + 1, step):
idx_lasso = np.argsort(np.abs(lasso_coef))[-n:]
X_lasso = X_scaled[:, idx_lasso]
lasso_rmse_list.append((n, rmse_on_original_scale(LinearRegression(), X_lasso)))
try:
rfe = RFE(LinearRegression(), n_features_to_select=n)
X_rfe = rfe.fit_transform(X_scaled, y_scaled)
rfe_rmse_list.append((n, rmse_on_original_scale(LinearRegression(), X_rfe)))
except:
rfe_rmse_list.append((n, np.nan))
try:
sfs = SequentialFeatureSelector(LinearRegression(), n_features_to_select=n, direction='forward', cv=tscv, n_jobs=-1)
X_sfs = sfs.fit_transform(X_scaled, y_scaled)
sfs_rmse_list.append((n, rmse_on_original_scale(LinearRegression(), X_sfs)))
except:
sfs_rmse_list.append((n, np.nan))
idx_rf = np.argsort(importances)[-n:]
X_rf = X_scaled[:, idx_rf]
rf_rmse_list.append((n, rmse_on_original_scale(LinearRegression(), X_rf)))
df_combined = (
pd.DataFrame(lasso_rmse_list, columns=['n_features', 'LASSO_RMSE'])
.merge(pd.DataFrame(rfe_rmse_list, columns=['n_features', 'RFE_RMSE']), on='n_features')
.merge(pd.DataFrame(sfs_rmse_list, columns=['n_features', 'Forward_RMSE']), on='n_features')
.merge(pd.DataFrame(rf_rmse_list, columns=['n_features', 'RF_RMSE']), on='n_features')
)
# --- Feature Names ---
best_lasso_n = df_combined.loc[df_combined['LASSO_RMSE'].idxmin(), 'n_features']
best_rfe_n = df_combined.loc[df_combined['RFE_RMSE'].idxmin(), 'n_features']
best_sfs_n = df_combined.loc[df_combined['Forward_RMSE'].idxmin(), 'n_features']
best_rf_n = df_combined.loc[df_combined['RF_RMSE'].idxmin(), 'n_features']
lasso_features = [feature_names[i] for i in np.argsort(np.abs(lasso_coef))[-best_lasso_n:]]
rfe = RFE(LinearRegression(), n_features_to_select=best_rfe_n).fit(X_scaled, y_scaled)
rfe_features = [feature_names[i] for i, flag in enumerate(rfe.support_) if flag]
sfs = SequentialFeatureSelector(LinearRegression(), n_features_to_select=best_sfs_n, direction='forward', cv=tscv).fit(X_scaled, y_scaled)
sfs_features = [feature_names[i] for i, flag in enumerate(sfs.get_support()) if flag]
rf_features = [feature_names[i] for i in np.argsort(importances)[-best_rf_n:]]
best_methods = {
'LASSO': {'n_features': best_lasso_n, 'rmse': df_combined.loc[df_combined['n_features'] == best_lasso_n, 'LASSO_RMSE'].values[0], 'features': lasso_features},
'RFE': {'n_features': best_rfe_n, 'rmse': df_combined.loc[df_combined['n_features'] == best_rfe_n, 'RFE_RMSE'].values[0], 'features': rfe_features},
'Forward': {'n_features': best_sfs_n, 'rmse': df_combined.loc[df_combined['n_features'] == best_sfs_n, 'Forward_RMSE'].values[0], 'features': sfs_features},
'RandomForest': {'n_features': best_rf_n, 'rmse': df_combined.loc[df_combined['n_features'] == best_rf_n, 'RF_RMSE'].values[0], 'features': rf_features}
}
return df_combined, best_methods
import matplotlib.pyplot as plt
target_cols = ['Cardiovascular diseases', 'Diabetes', 'Life expectancy']
results = {}
for target in target_cols:
lag_cols = [f'{target}_lag1', f'{target}_lag2']
cols_to_drop = target_cols + [col for col in lag_cols if col in df_lagged.columns]
X = df_lagged.drop(columns=cols_to_drop)
y = df_lagged[target]
print(f"\n🔍 Feature selection for target: {target}")
df_combined, best_methods = find_best_feature_count(X, y)
results[target] = {'df_combined': df_combined, 'best_methods': best_methods}
for method, info in best_methods.items():
print(f"\nMethod: {method}")
print(f"Best number of features: {info['n_features']}")
print(f"Best RMSE: {info['rmse']:.4f}")
print(f"Selected features: {info['features']}")
plt.figure(figsize=(10,6))
plt.plot(df_combined['n_features'], df_combined['LASSO_RMSE'], label='LASSO', marker='o')
plt.plot(df_combined['n_features'], df_combined['RFE_RMSE'], label='RFE', marker='s')
plt.plot(df_combined['n_features'], df_combined['Forward_RMSE'], label='Forward', marker='^')
plt.plot(df_combined['n_features'], df_combined['RF_RMSE'], label='Random Forest', marker='v')
plt.xlabel('Number of Features')
plt.ylabel('RMSE')
plt.title(f'RMSE vs Number of Features for Target: {target}')
plt.grid(True)
plt.legend()
plt.show()
🔍 Feature selection for target: Cardiovascular diseases Method: LASSO Best number of features: 3 Best RMSE: 144.6855 Selected features: ['BMI_avg_lag2', 'BMI_avg_lag3', 'GDP'] Method: RFE Best number of features: 3 Best RMSE: 144.5087 Selected features: ['Income', 'GDP', 'BMI_avg'] Method: Forward Best number of features: 11 Best RMSE: 145.4053 Selected features: ['Unemployment Rate', 'Incomplete tertiary education', 'GDP', 'Unemployment Rate_lag1', 'Unemployment Rate_lag2', 'Unemployment Rate_lag3', 'Incomplete tertiary education_lag1', 'Incomplete tertiary education_lag2', 'Incomplete tertiary education_lag3', 'GDP_lag1', 'lagged'] Method: RandomForest Best number of features: 3 Best RMSE: 145.3185 Selected features: ['GDP_lag1', 'GDP', 'Incomplete tertiary education']
🔍 Feature selection for target: Diabetes Method: LASSO Best number of features: 11 Best RMSE: 3.6356 Selected features: ['Incomplete tertiary education_lag3', 'Sex ratio_lag3', 'Incomplete tertiary education', 'Income_lag3', 'CPI', 'Median age_lag3', 'Cost of a healthy diet', 'Income', 'GDP', 'BMI_avg', 'BMI_avg_lag3'] Method: RFE Best number of features: 9 Best RMSE: 3.6424 Selected features: ['Cost of a healthy diet', 'Income', 'Incomplete tertiary education', 'GDP', 'CPI', 'BMI_avg', 'Income_lag3', 'Median age_lag3', 'BMI_avg_lag3'] Method: Forward Best number of features: 17 Best RMSE: 3.6281 Selected features: ['Income', 'Inflation', 'Child mortality rate', 'Incomplete tertiary education', 'Sex ratio', 'GDP', 'BMI_avg', 'Cost of a healthy diet_lag2', 'Income_lag1', 'Income_lag2', 'Income_lag3', 'Inflation_lag3', 'Sex ratio_lag3', 'CPI_lag3', 'BMI_avg_lag1', 'BMI_avg_lag3', 'lagged'] Method: RandomForest Best number of features: 17 Best RMSE: 3.6859 Selected features: ['CPI', 'Median age', 'Unemployment Rate_lag3', 'Inflation', 'Unemployment Rate_lag1', 'Median age_lag3', 'GDP_lag1', 'Cost of a healthy diet', 'Gini coefficient', 'Incomplete tertiary education', 'Unemployment Rate', 'GDP', 'Income', 'BMI_avg_lag2', 'BMI_avg_lag1', 'BMI_avg_lag3', 'BMI_avg']
🔍 Feature selection for target: Life expectancy Method: LASSO Best number of features: 5 Best RMSE: 3.5032 Selected features: ['Median age_lag3', 'Sex ratio', 'GDP', 'Child mortality rate_lag3', 'Child mortality rate'] Method: RFE Best number of features: 7 Best RMSE: 3.5002 Selected features: ['Child mortality rate', 'Sex ratio', 'GDP', 'Median age', 'Child mortality rate_lag2', 'Child mortality rate_lag3', 'Median age_lag3'] Method: Forward Best number of features: 17 Best RMSE: 3.4964 Selected features: ['Child mortality rate', 'Sex ratio', 'GDP', 'Median age', 'Child mortality rate_lag1', 'Child mortality rate_lag2', 'Child mortality rate_lag3', 'Sex ratio_lag1', 'Sex ratio_lag2', 'Sex ratio_lag3', 'GDP_lag1', 'GDP_lag2', 'GDP_lag3', 'Median age_lag2', 'BMI_avg_lag2', 'BMI_avg_lag3', 'lagged'] Method: RandomForest Best number of features: 9 Best RMSE: 3.5058 Selected features: ['BMI_avg', 'GDP', 'Sex ratio', 'Income', 'Median age_lag3', 'Median age', 'Child mortality rate_lag2', 'Child mortality rate_lag3', 'Child mortality rate']
Feature selection Comparison with R sq, MAPE, MSE (Summary and Charts)¶
Compare Feature selection and the best number of features using Metrics (R square, MAPE, MSE)
# Feature selection with R sq, MAPE, MSE
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LassoCV, LinearRegression
from sklearn.feature_selection import RFE, SequentialFeatureSelector
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error, r2_score
from sklearn.preprocessing import StandardScaler
# Move plot_metrics function definition to the beginning
def plot_metrics(df_combined, target_name):
metrics = ['RMSE', 'MAPE', 'R2']
methods = ['LASSO', 'RFE', 'Forward', 'RandomForest']
for metric in metrics:
plt.figure(figsize=(10,6))
for method in methods:
# Check if the metric column exists for the method before plotting
metric_col = f'{method}_{metric}'
if metric_col in df_combined.columns:
plt.plot(df_combined['n_features'], df_combined[metric_col], label=method, marker='o')
else:
print(f"Warning: Metric column '{metric_col}' not found in DataFrame for plotting.")
plt.title(f'{metric} vs Number of Features ({target_name})')
plt.xlabel('Number of Features')
plt.ylabel(metric)
plt.legend()
plt.grid(True)
plt.show()
def evaluate_model(model, X_subset, y_scaled, y_original, y_scaler, tscv):
y_preds, y_tests = [], []
# Ensure X_subset and y_scaled have the same index for splitting
# Convert X_subset to DataFrame if it's numpy array to use index for splitting
if not isinstance(X_subset, pd.DataFrame):
# Assuming X_subset corresponds to the same rows as y_scaled
X_subset_df = pd.DataFrame(X_subset, index=pd.Series(y_scaled).index)
else:
X_subset_df = X_subset
for train_idx, test_idx in tscv.split(X_subset_df): # Use X_subset_df for splitting
# Select data using indices from the split
X_train, X_test = X_subset_df.iloc[train_idx], X_subset_df.iloc[test_idx]
y_train_scaled, y_test_scaled = y_scaled[train_idx], y_scaled[test_idx]
y_test_original = y_original[test_idx] # Select original y for test set
# Ensure there's data in train and test sets for the current fold
if len(X_train) > 0 and len(X_test) > 0:
try:
model.fit(X_train, y_train_scaled)
y_pred_scaled = model.predict(X_test)
y_pred_original = y_scaler.inverse_transform(y_pred_scaled.reshape(-1, 1)).ravel()
y_preds.extend(y_pred_original)
y_tests.extend(y_test_original)
except Exception as e:
print(f"Error during model fitting or prediction in a fold: {e}")
# Extend with NaNs or skip if error occurs in a fold
y_preds.extend([np.nan] * len(y_test_original))
y_tests.extend(y_test_original) # Still add the test actuals to keep lists aligned
# Calculate metrics only if y_tests and y_preds are not empty and don't contain NaNs/Infs
y_tests_cleaned = np.array(y_tests)
y_preds_cleaned = np.array(y_preds)
# Remove pairs where either actual or prediction is NaN/Inf
valid_indices = np.isfinite(y_tests_cleaned) & np.isfinite(y_preds_cleaned)
y_tests_cleaned = y_tests_cleaned[valid_indices]
y_preds_cleaned = y_preds_cleaned[valid_indices]
if len(y_tests_cleaned) > 0:
rmse = np.sqrt(mean_squared_error(y_tests_cleaned, y_preds_cleaned))
mape = mean_absolute_percentage_error(y_tests_cleaned, y_preds_cleaned)
r2 = r2_score(y_tests_cleaned, y_preds_cleaned)
else:
# Return NaN if no valid data points for metric calculation
rmse, mape, r2 = np.nan, np.nan, np.nan
return rmse, mape, r2
def find_best_features_with_metrics(X_df, y, max_features=None):
# Ensure X_df has a proper index for splitting in evaluate_model
if not isinstance(X_df.index, pd.MultiIndex):
# Assuming X_df came from df_lagged which has MultiIndex, try to restore it
# This might require passing the original index or ensuring X_df retains it
# For robustness, let's assume X_df needs an index for splitting
# A simpler approach might be to reset index in evaluate_model if it's numpy array
pass # Let's handle index in evaluate_model as it receives X_subset
X_scaler = StandardScaler()
# Fit scaler on X_df values, but keep X_df as DataFrame to retain index
X_scaled_values = X_scaler.fit_transform(X_df.values)
X_scaled_df = pd.DataFrame(X_scaled_values, columns=X_df.columns, index=X_df.index) # Recreate DataFrame with index
feature_names = X_scaled_df.columns.tolist()
y = y.values.reshape(-1, 1) # y is already a Series from df_lagged, convert to numpy array
y_original = y.ravel() # Keep original y values as numpy array
y_scaler = StandardScaler()
y_scaled = y_scaler.fit_transform(y).ravel() # Scale y
tscv = TimeSeriesSplit(n_splits=5)
max_features = min(max_features or 30, X_scaled_df.shape[1]) # Max features up to total features
# Handle case where there are no features
if X_scaled_df.shape[1] == 0:
print("No features available in X_df. Skipping feature selection.")
return pd.DataFrame(), {'LASSO': {'n_features': 0, 'rmse': np.nan, 'features': []},
'RFE': {'n_features': 0, 'rmse': np.nan, 'features': []},
'Forward': {'n_features': 0, 'rmse': np.nan, 'features': []},
'RandomForest': {'n_features': 0, 'rmse': np.nan, 'features': []}}
lasso = LassoCV(cv=tscv, random_state=42).fit(X_scaled_df, y_scaled)
lasso_coef = lasso.coef_
# Ensure Random Forest is fitted on X_scaled_df (DataFrame)
rf_model = RandomForestRegressor(n_estimators=100, random_state=42).fit(X_scaled_df, y_scaled)
importances = rf_model.feature_importances_
results = {'LASSO': [], 'RFE': [], 'Forward': [], 'RandomForest': []}
# Max features for loop should be <= total features
max_loop_features = min(max_features, X_scaled_df.shape[1])
for n in range(1, max_loop_features + 1):
# LASSO
idx = np.argsort(np.abs(lasso_coef))[-n:]
# Select columns using index from X_scaled_df
X_subset_lasso = X_scaled_df.iloc[:, idx]
# Pass DataFrame to evaluate_model
results['LASSO'].append((n, *evaluate_model(LinearRegression(), X_subset_lasso, y_scaled, y_original, y_scaler, tscv)))
# RFE
try:
# RFE requires n_features_to_select <= n_features
if n <= X_scaled_df.shape[1]:
rfe = RFE(LinearRegression(), n_features_to_select=n)
# Fit on X_scaled_df (DataFrame) and get transformed numpy array
X_subset_rfe_np = rfe.fit_transform(X_scaled_df, y_scaled)
# Pass numpy array to evaluate_model - evaluate_model handles conversion to DataFrame for splitting
results['RFE'].append((n, *evaluate_model(LinearRegression(), X_subset_rfe_np, y_scaled, y_original, y_scaler, tscv)))
else:
results['RFE'].append((n, np.nan, np.nan, np.nan))
except Exception as e:
print(f"RFE failed for n={n}: {e}")
results['RFE'].append((n, np.nan, np.nan, np.nan))
# Forward
try:
# SFS requires k_features <= n_features
if n <= X_scaled_df.shape[1]:
# Use X_scaled_df (DataFrame) for SFS fit
sfs = SequentialFeatureSelector(LinearRegression(), n_features_to_select=n, direction='forward', cv=tscv, n_jobs=-1)
# Fit on X_scaled_df (DataFrame) and get transformed numpy array
X_subset_sfs_np = sfs.fit_transform(X_scaled_df, y_scaled)
# Pass numpy array to evaluate_model
results['Forward'].append((n, *evaluate_model(LinearRegression(), X_subset_sfs_np, y_scaled, y_original, y_scaler, tscv)))
else:
results['Forward'].append((n, np.nan, np.nan, np.nan))
except Exception as e:
print(f"Forward Selection failed for n={n}: {e}")
results['Forward'].append((n, np.nan, np.nan, np.nan))
# RF Importance
idx = np.argsort(importances)[-n:]
# Select columns using index from X_scaled_df
X_subset_rf = X_scaled_df.iloc[:, idx]
# Pass DataFrame to evaluate_model
results['RandomForest'].append((n, *evaluate_model(LinearRegression(), X_subset_rf, y_scaled, y_original, y_scaler, tscv)))
# Build metrics DataFrame
dfs = []
for method, vals in results.items():
df = pd.DataFrame(vals, columns=['n_features', f'{method}_RMSE', f'{method}_MAPE', f'{method}_R2'])
dfs.append(df)
df_combined = dfs[0]
for df in dfs[1:]:
df_combined = df_combined.merge(df, on='n_features', how='outer')
return df_combined
# Assuming df_lagged is available and contains the data with lags
# Assuming target_cols is defined
target_cols = ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']
results = {}
for target in target_cols:
# Ensure df_lagged is available and contains the target column
if 'df_lagged' in locals() and target in df_lagged.columns:
lag_cols = [f'{target}_lag1', f'{target}_lag2']
# Ensure we only try to drop columns that exist in df_lagged
cols_to_drop = [target] + [col for col in lag_cols if col in df_lagged.columns]
# Select features for X - drop target(s) and their lags
X = df_lagged.drop(columns=cols_to_drop)
# Select the current target variable and drop NaNs
y = df_lagged[target].dropna()
# Align X with the cleaned y by index
X = X.loc[y.index]
# Ensure X is not empty after aligning with y
if X.empty:
print(f"No valid data points after dropping NaNs for target: {target}. Skipping evaluation.")
results[target] = pd.DataFrame() # Store an empty DataFrame
continue
print(f"\n🔍 Evaluating for target: {target}")
# Pass X as a DataFrame and y as a Series (without NaNs)
df_metrics = find_best_features_with_metrics(X, y)
results[target] = df_metrics
# Plot metrics for the current target only if df_metrics is not empty
if not df_metrics.empty:
plot_metrics(df_metrics, target)
else:
print(f"No metrics to plot for target: {target}.")
else:
print(f"df_lagged or target column '{target}' not found. Skipping evaluation for this target.")
🔍 Evaluating for target: Life expectancy
🔍 Evaluating for target: Cardiovascular diseases
/usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.3344987559357833, tolerance: 1.2543280301043949 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.3041213861906726, tolerance: 1.2543280301043949 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.2939728821220342, tolerance: 1.2543280301043949 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.291560398247384, tolerance: 1.2543280301043949 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.2833807594306563, tolerance: 1.2543280301043949 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.2738431218404003, tolerance: 1.2543280301043949 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.263577498113591, tolerance: 1.2543280301043949 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.703431344112687, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.6969510646304116, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.6853641250145301, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.6726185397310473, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.6593221832208656, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.6457220310112461, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.631997326414421, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.618292309503886, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram( /usr/local/lib/python3.11/dist-packages/sklearn/linear_model/_coordinate_descent.py:681: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1.6047235924033885, tolerance: 1.5837940758962923 model = cd_fast.enet_coordinate_descent_gram(
🔍 Evaluating for target: Diabetes
Comparative Summary Table - Feature Selection with metrics (RMSE, MAPE, and R²)¶
## The best Feature Selection with different metrics TABLE - REVISED - Seed not fixed
# Install tabulate if needed
!pip install tabulate
from sklearn.linear_model import Ridge, LassoCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import SequentialFeatureSelector, RFE
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler
from tabulate import tabulate
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer # Import Imputer
# Main function to calculate metrics for a given set of features
def calculate_metrics_for_features(X_df, y, feature_indices):
# Impute missing values in X_df
imputer = SimpleImputer(strategy='mean')
X_df_imputed = pd.DataFrame(imputer.fit_transform(X_df), columns=X_df.columns, index=X_df.index)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_df_imputed) # Use imputed data here
y = y.values.reshape(-1, 1)
y_scaler = StandardScaler().fit(y)
y_scaled = y_scaler.transform(y).ravel()
y_original = y.ravel()
tscv = TimeSeriesSplit(n_splits=3)
X_subset = X_scaled[:, feature_indices]
y_preds, y_tests = [], []
for train_idx, test_idx in tscv.split(X_subset):
model = Ridge()
model.fit(X_subset[train_idx], y_scaled[train_idx])
pred = model.predict(X_subset[test_idx])
y_pred = y_scaler.inverse_transform(pred.reshape(-1, 1)).ravel()
y_preds.extend(y_pred)
y_tests.extend(y_original[test_idx])
return (
np.sqrt(mean_squared_error(y_tests, y_preds)),
mean_absolute_error(y_tests, y_preds),
r2_score(y_tests, y_preds)
)
# Extract Best Results per Method from the 'results' dictionary
def extract_best_per_method(results_dict, X_data_for_targets, y_data_for_targets):
summary = []
for target, target_results in results_dict.items():
df_combined = target_results['df_combined']
best_methods_info = target_results['best_methods']
X_target = X_data_for_targets[target] # Get the correct X for this target
y_target = y_data_for_targets[target] # Get the correct y for this target
for method, info in best_methods_info.items():
n_features = info['n_features']
selected_feature_names = info['features']
# Get the indices of the selected features from the X_target DataFrame columns
try:
# Ensure selected_feature_names are in the columns of X_target
valid_selected_features = [col for col in selected_feature_names if col in X_target.columns]
feature_indices = [X_target.columns.get_loc(col) for col in valid_selected_features]
except KeyError as e:
print(f"Error: Feature '{e}' not found in original DataFrame columns for target {target}, method {method}. Skipping.")
continue # Skip this combination if features are not found
if n_features > 0 and feature_indices:
# Calculate metrics using the selected features and the correct X_target and y_target
# Pass the subset of X_target using the valid_selected_features column names
rmse, mae, r2 = calculate_metrics_for_features(X_target[valid_selected_features], y_target, list(range(len(valid_selected_features)))) # Pass indices relative to the subset
summary.append({
'Target': target,
'Method': method,
'n_features': len(valid_selected_features), # Use the count of valid features
'RMSE': round(rmse, 2),
'MAE': round(mae, 2),
'R²': round(r2, 4)
})
elif n_features == 0:
# Handle case with 0 features if necessary, although typically we select at least 1
summary.append({
'Target': target,
'Method': method,
'n_features': 0,
'RMSE': np.nan, # Or a baseline metric if applicable
'MAE': np.nan,
'R²': np.nan
})
return pd.DataFrame(summary)
# Assuming df_lagged is available from previous steps
# Prepare the X and y dataframes for each target as they were used in the feature selection loop
target_cols = ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']
X_data_for_targets = {}
y_data_for_targets = {}
if 'df_lagged' in locals():
for target in target_cols:
if target in df_lagged.columns:
lag_cols = [f'{target}_lag1', f'{target}_lag2']
cols_to_drop = [target] + [col for col in lag_cols if col in df_lagged.columns]
X = df_lagged.drop(columns=cols_to_drop)
y = df_lagged[target].dropna() # Use the y with NaNs dropped as in the previous cell
# Align X with the cleaned y by index
X = X.loc[y.index]
X_data_for_targets[target] = X
y_data_for_targets[target] = y
else:
print(f"Target column '{target}' not found in df_lagged. Cannot prepare data for this target.")
# Extract Best Results per Method
# Use the 'results' dictionary generated from the previous cell's execution and the prepared X and y data
if 'results' in locals() and results and X_data_for_targets and y_data_for_targets:
best_performance_df = extract_best_per_method(results, X_data_for_targets, y_data_for_targets)
# Print Final Table
if not best_performance_df.empty:
print("\nBest Performance per Method\n")
print(tabulate(best_performance_df, headers='keys', tablefmt='fancy_grid', showindex=False))
else:
print("\nNo best performance results to display.")
else:
print("\n'results' dictionary, X_data_for_targets, or y_data_for_targets not found or is empty. Please run the feature selection cell first and ensure data is prepared correctly.")
# export and download file
best_performance_df.to_csv("best_feature_selection_summary.csv", index=False)
from google.colab import files
files.download("best_feature_selection_summary.csv")
Requirement already satisfied: tabulate in /usr/local/lib/python3.11/dist-packages (0.9.0) Best Performance per Method ╒═════════════════════════╤══════════════╤══════════════╤════════╤═══════╤═════════╕ │ Target │ Method │ n_features │ RMSE │ MAE │ R² │ ╞═════════════════════════╪══════════════╪══════════════╪════════╪═══════╪═════════╡ │ Cardiovascular diseases │ LASSO │ 3 │ 144.69 │ 37.6 │ 0.0043 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Cardiovascular diseases │ RFE │ 3 │ 144.51 │ 38.52 │ 0.0067 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Cardiovascular diseases │ Forward │ 11 │ 145.41 │ 37.95 │ -0.0056 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Cardiovascular diseases │ RandomForest │ 3 │ 145.32 │ 37.89 │ -0.0044 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Diabetes │ LASSO │ 11 │ 3.64 │ 2.6 │ 0.4649 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Diabetes │ RFE │ 9 │ 3.64 │ 2.6 │ 0.4629 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Diabetes │ Forward │ 17 │ 3.63 │ 2.58 │ 0.4671 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Diabetes │ RandomForest │ 17 │ 3.69 │ 2.62 │ 0.45 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Life expectancy │ LASSO │ 5 │ 3.5 │ 2.69 │ 0.9133 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Life expectancy │ RFE │ 7 │ 3.5 │ 2.68 │ 0.9134 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Life expectancy │ Forward │ 17 │ 3.5 │ 2.68 │ 0.9136 │ ├─────────────────────────┼──────────────┼──────────────┼────────┼───────┼─────────┤ │ Life expectancy │ RandomForest │ 9 │ 3.51 │ 2.68 │ 0.9132 │ ╘═════════════════════════╧══════════════╧══════════════╧════════╧═══════╧═════════╛
Base on the result of the table, the following Feature Selection method and number of features will be used in this study as follows:
Life Expectancy - Forward Selection - # of features = 17
Cardiovascular Diseases - RFE - # of features = 3
Diabetes - Forward Selection - # of features = 17
# Feature Importance Table - REVISED
from sklearn.linear_model import LinearRegression, LassoCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import RFE
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
# === Setup
targets = ['Cardiovascular diseases', 'Diabetes', 'Life expectancy']
methods = ['Forward', 'RFE', 'LASSO', 'Random Forest']
all_features = [
'Child mortality rate', 'Child mortality rate_lag3', 'GDP', 'Sex ratio_lag1', 'Unemployment Rate_lag1',
'Cost of a healthy diet', 'Child mortality rate_lag2', 'Sex ratio', 'Sex ratio_lag2', 'Unemployment Rate',
'Unemployment Rate_lag3', 'Unemployment Rate_lag2', 'Sex ratio_lag3', 'Cost of a healthy diet_lag1',
'Cost of a healthy diet_lag2', 'Cost of a healthy diet_lag3', 'GDP_lag1','BMI_avg_lag3', 'Income',
'GDP_lag3', 'Median age_lag3', 'Income_lag3', 'Income_lag2', 'Income_lag1', 'GDP_lag2', 'BMI_avg_lag2',
'BMI_avg_lag1', 'BMI_avg','Incomplete tertiary education'
]
# === Initialize importance table
multi_method_importance = pd.DataFrame(index=all_features,
columns=pd.MultiIndex.from_product([targets, methods]),
dtype=float).fillna(0.0)
# === Function: Standardization and prep
def prepare_data(target, features):
valid_features = [f for f in features if f in df_lagged.columns]
df_temp = df_lagged[valid_features + [target]].dropna()
X = df_temp[valid_features]
y = df_temp[target]
X_scaled = pd.DataFrame(StandardScaler().fit_transform(X), columns=X.columns)
return X_scaled, y, valid_features
# === FORWARD SELECTION
def run_forward(X, y, valid_features, max_features):
selected = []
remaining = valid_features.copy()
for _ in range(min(max_features, len(remaining))):
scores = {}
for f in remaining:
trial = selected + [f]
model = LinearRegression()
neg_mse = cross_val_score(model, X[trial], y,
scoring='neg_mean_squared_error', cv=5)
rmse = np.mean(np.sqrt(-neg_mse))
scores[f] = rmse
best_feature = min(scores, key=scores.get)
selected.append(best_feature)
remaining.remove(best_feature)
for f in selected:
multi_method_importance.loc[f, (target, 'Forward')] = 1
# === RFE
def run_rfe(X, y, valid_features, num_features):
model = LinearRegression()
selector = RFE(model, n_features_to_select=num_features)
selector = selector.fit(X, y)
for f, support in zip(valid_features, selector.support_):
if support:
multi_method_importance.loc[f, (target, 'RFE')] = 1
# === LASSO
def run_lasso(X, y):
model = LassoCV(cv=5, random_state=42)
model.fit(X, y)
for i, f in enumerate(X.columns):
multi_method_importance.loc[f, (target, 'LASSO')] = round(abs(model.coef_[i]), 4)
# === RANDOM FOREST
def run_rf(X, y):
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X, y)
importances = rf.feature_importances_
for i, f in enumerate(X.columns):
multi_method_importance.loc[f, (target, 'Random Forest')] = round(importances[i], 4)
# === Run loop for all targets
for target in targets:
if target not in df_lagged.columns:
print(f"⚠️ Skipping {target} — not found in dataset.")
continue
X_scaled, y, valid = prepare_data(target, all_features)
run_forward(X_scaled, y, valid, max_features=17 if target != 'Cardiovascular diseases' else 3)
run_rfe(X_scaled, y, valid, num_features=3)
run_lasso(X_scaled, y)
run_rf(X_scaled, y)
# === Display styled table
styled_multi_table = multi_method_importance.style \
.set_caption("📊 Multi-Method Feature Importance Table") \
.format(precision=4) \
.set_table_styles([
{'selector': 'table', 'props': [('border-collapse', 'collapse'),
('border', '1px solid black')]},
{'selector': 'th, td', 'props': [('border', '1px solid black'), ('padding', '4px')]}
])
# Display multi table
# Summary table combining the feature importance score with descending order for each target
for target in targets:
col_forward = (target, 'Forward')
col_rfe = (target, 'RFE')
col_lasso = (target, 'LASSO')
col_rf = (target, 'Random Forest')
combined_name = (target, 'Combined')
multi_method_importance[combined_name] = (
multi_method_importance[col_forward].fillna(0) +
multi_method_importance[col_rfe].fillna(0) +
multi_method_importance[col_lasso].fillna(0) +
multi_method_importance[col_rf].fillna(0)
)
# === Reorder for display: sort by combined score for each target
for target in targets:
sort_col = (target, 'Combined')
sorted_features = multi_method_importance.sort_values(by=sort_col, ascending=False).index
multi_method_importance = multi_method_importance.loc[sorted_features]
# === Display enhanced table
styled_combined_table = multi_method_importance.style \
.set_caption("⭐ Enhanced Feature Importance Comparison (4 Methods + Combined)") \
.format(precision=4) \
.set_table_styles([
{'selector': 'table', 'props': [('border-collapse', 'collapse'), ('border', '1px solid black')]},
{'selector': 'th, td', 'props': [('border', '1px solid black'), ('padding', '5px')]}
])
display(styled_combined_table)
# Save the file
multi_method_importance.to_csv("enhanced_feature_importance_comparison.csv", index=True)
# Download the file (include full path)
from google.colab import files
files.download('enhanced_feature_importance_comparison.csv')
| Cardiovascular diseases | Diabetes | Life expectancy | Cardiovascular diseases | Diabetes | Life expectancy | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Forward | RFE | LASSO | Random Forest | Forward | RFE | LASSO | Random Forest | Forward | RFE | LASSO | Random Forest | Combined | Combined | Combined | |
| Child mortality rate | 0.0000 | 0.0000 | 0.0000 | 0.0031 | 0.0000 | 0.0000 | 0.0000 | 0.0083 | 1.0000 | 1.0000 | 18.7165 | 0.9229 | 0.0031 | 0.0083 | 21.6394 |
| Child mortality rate_lag3 | 0.0000 | 0.0000 | 0.0000 | 0.0018 | 0.0000 | 0.0000 | 0.1466 | 0.0060 | 1.0000 | 1.0000 | 7.5788 | 0.0210 | 0.0018 | 0.1526 | 9.5998 |
| Child mortality rate_lag2 | 0.0000 | 0.0000 | 0.0000 | 0.0027 | 0.0000 | 0.0000 | 0.0000 | 0.0054 | 1.0000 | 1.0000 | 0.0000 | 0.0053 | 0.0027 | 0.0054 | 2.0053 |
| Income_lag3 | 0.0000 | 0.0000 | 0.0000 | 0.0666 | 1.0000 | 0.0000 | 0.3318 | 0.0109 | 1.0000 | 0.0000 | 0.3482 | 0.0020 | 0.0666 | 1.3427 | 1.3502 |
| GDP | 0.0000 | 1.0000 | 0.0000 | 0.0878 | 1.0000 | 1.0000 | 0.3975 | 0.0254 | 1.0000 | 0.0000 | 0.3190 | 0.0021 | 1.0878 | 2.4229 | 1.3211 |
| BMI_avg | 1.0000 | 0.0000 | 0.0000 | 0.0062 | 0.0000 | 0.0000 | 0.0000 | 0.0106 | 1.0000 | 0.0000 | 0.1729 | 0.0026 | 1.0062 | 0.0106 | 1.1755 |
| Sex ratio | 0.0000 | 0.0000 | 0.0000 | 0.0076 | 1.0000 | 0.0000 | 0.0923 | 0.0054 | 1.0000 | 0.0000 | 0.1535 | 0.0024 | 0.0076 | 1.0977 | 1.1559 |
| Median age_lag3 | 0.0000 | 0.0000 | 0.0000 | 0.0060 | 1.0000 | 0.0000 | 0.3257 | 0.0281 | 1.0000 | 0.0000 | 0.1433 | 0.0073 | 0.0060 | 1.3538 | 1.1506 |
| Sex ratio_lag3 | 0.0000 | 0.0000 | 0.0000 | 0.0262 | 1.0000 | 0.0000 | 0.1618 | 0.0071 | 1.0000 | 0.0000 | 0.0936 | 0.0023 | 0.0262 | 1.1689 | 1.0959 |
| Sex ratio_lag2 | 0.0000 | 0.0000 | 0.0000 | 0.0108 | 1.0000 | 0.0000 | 0.0150 | 0.0048 | 1.0000 | 0.0000 | 0.0793 | 0.0019 | 0.0108 | 1.0198 | 1.0812 |
| Sex ratio_lag1 | 0.0000 | 0.0000 | 0.0000 | 0.0095 | 1.0000 | 0.0000 | 0.0165 | 0.0045 | 1.0000 | 0.0000 | 0.0246 | 0.0016 | 0.0095 | 1.0210 | 1.0262 |
| BMI_avg_lag3 | 1.0000 | 1.0000 | 0.0000 | 0.0558 | 1.0000 | 1.0000 | 3.3170 | 0.5842 | 1.0000 | 0.0000 | 0.0000 | 0.0018 | 2.0558 | 5.9012 | 1.0018 |
| GDP_lag3 | 0.0000 | 0.0000 | 0.0000 | 0.1160 | 1.0000 | 0.0000 | 0.2146 | 0.0263 | 1.0000 | 0.0000 | 0.0000 | 0.0016 | 0.1160 | 1.2409 | 1.0016 |
| GDP_lag1 | 0.0000 | 0.0000 | 0.0000 | 0.0272 | 1.0000 | 0.0000 | 0.0000 | 0.0150 | 1.0000 | 0.0000 | 0.0000 | 0.0013 | 0.0272 | 1.0150 | 1.0013 |
| BMI_avg_lag1 | 1.0000 | 0.0000 | 0.0000 | 0.0098 | 0.0000 | 0.0000 | 0.0000 | 0.0083 | 1.0000 | 0.0000 | 0.0000 | 0.0013 | 1.0098 | 0.0083 | 1.0013 |
| GDP_lag2 | 0.0000 | 0.0000 | 0.0000 | 0.0206 | 1.0000 | 0.0000 | 0.0000 | 0.0172 | 1.0000 | 0.0000 | 0.0000 | 0.0012 | 0.0206 | 1.0172 | 1.0012 |
| BMI_avg_lag2 | 0.0000 | 0.0000 | 0.0000 | 0.0165 | 0.0000 | 0.0000 | 0.0000 | 0.0103 | 1.0000 | 0.0000 | 0.0000 | 0.0011 | 0.0165 | 0.0103 | 1.0011 |
| Income | 0.0000 | 1.0000 | 0.0000 | 0.0522 | 1.0000 | 1.0000 | 0.5190 | 0.0574 | 0.0000 | 0.0000 | 0.3121 | 0.0025 | 1.0522 | 2.5764 | 0.3146 |
| Unemployment Rate | 0.0000 | 0.0000 | 0.0000 | 0.0105 | 0.0000 | 0.0000 | 0.0432 | 0.0156 | 0.0000 | 0.0000 | 0.0668 | 0.0025 | 0.0105 | 0.0588 | 0.0693 |
| Cost of a healthy diet | 0.0000 | 0.0000 | 0.0000 | 0.0170 | 1.0000 | 0.0000 | 0.3548 | 0.0223 | 0.0000 | 0.0000 | 0.0505 | 0.0013 | 0.0170 | 1.3771 | 0.0518 |
| Incomplete tertiary education | 0.0000 | 0.0000 | 0.0000 | 0.2616 | 0.0000 | 0.0000 | 0.3349 | 0.0377 | 0.0000 | 0.0000 | 0.0203 | 0.0043 | 0.2616 | 0.3726 | 0.0246 |
| Unemployment Rate_lag3 | 0.0000 | 0.0000 | 0.0000 | 0.0230 | 0.0000 | 0.0000 | 0.0348 | 0.0190 | 0.0000 | 0.0000 | 0.0000 | 0.0014 | 0.0230 | 0.0538 | 0.0014 |
| Unemployment Rate_lag1 | 0.0000 | 0.0000 | 0.0000 | 0.0099 | 0.0000 | 0.0000 | 0.0002 | 0.0112 | 0.0000 | 0.0000 | 0.0000 | 0.0013 | 0.0099 | 0.0114 | 0.0013 |
| Cost of a healthy diet_lag3 | 0.0000 | 0.0000 | 0.0000 | 0.0134 | 0.0000 | 0.0000 | 0.0383 | 0.0104 | 0.0000 | 0.0000 | 0.0000 | 0.0013 | 0.0134 | 0.0487 | 0.0013 |
| Income_lag2 | 0.0000 | 0.0000 | 0.0000 | 0.0620 | 1.0000 | 0.0000 | 0.0522 | 0.0116 | 0.0000 | 0.0000 | 0.0000 | 0.0012 | 0.0620 | 1.0638 | 0.0012 |
| Cost of a healthy diet_lag2 | 0.0000 | 0.0000 | 0.0000 | 0.0174 | 1.0000 | 0.0000 | 0.0367 | 0.0054 | 0.0000 | 0.0000 | 0.0000 | 0.0011 | 0.0174 | 1.0421 | 0.0011 |
| Income_lag1 | 0.0000 | 0.0000 | 0.0000 | 0.0371 | 1.0000 | 0.0000 | 0.0774 | 0.0147 | 0.0000 | 0.0000 | 0.0000 | 0.0011 | 0.0371 | 1.0921 | 0.0011 |
| Unemployment Rate_lag2 | 0.0000 | 0.0000 | 0.0000 | 0.0124 | 0.0000 | 0.0000 | 0.0525 | 0.0104 | 0.0000 | 0.0000 | 0.0000 | 0.0011 | 0.0124 | 0.0629 | 0.0011 |
| Cost of a healthy diet_lag1 | 0.0000 | 0.0000 | 0.0000 | 0.0094 | 1.0000 | 0.0000 | 0.0000 | 0.0066 | 0.0000 | 0.0000 | 0.0000 | 0.0011 | 0.0094 | 1.0066 | 0.0011 |
Refer to the above table - The feature importance analysis across multiple selection methods—Forward Selection, Recursive Feature Elimination (RFE), LASSO, and Random Forest—reveals distinct and insightful patterns in how different variables relate to three key health outcomes: life expectancy, diabetes, and cardiovascular diseases (CVD). For life expectancy, the most dominant predictor is child mortality rate, including its lagged versions. This finding underscores a strong inverse relationship between child mortality and life expectancy, highlighting the long-term benefits of improving early childhood health. Other notable features include lagged BMI averages and socioeconomic indicators such as income and sex ratio, suggesting that both historical health trends and broader demographic factors influence longevity.
In the case of diabetes, the most influential feature is BMI_avg_lag3, indicating that higher BMI levels from three years prior are a strong predictor of diabetes prevalence. This reflects the chronic and gradual development of diabetes linked to long-term obesity. Socioeconomic factors like income, the cost of a healthy diet, and lagged income variables also emerge as important predictors, suggesting that financial access to healthy food and lifestyle conditions are significant contributors. Additionally, lagged sex ratio and income highlight the delayed impact of gender distribution and earnings on diabetes rates.
For cardiovascular diseases, BMI_avg_lag3 again stands out as a major factor, pointing to obesity’s long-term role in heart-related conditions. Uniquely, education-related features, such as incomplete tertiary education, show relevance only for CVD, indicating that educational attainment may influence cardiovascular health through awareness, healthcare access, or lifestyle choices. Economic indicators like GDP and income also contribute but to a lesser extent than for diabetes or life expectancy.
Across all targets, lagged variables consistently outperform current-year features, emphasizing the delayed effects of socioeconomic and health conditions on public health outcomes. For instance, lagged income, BMI, and child mortality often provide stronger predictive power than their contemporaneous counterparts. This suggests that interventions in health or economic policy may take several years to manifest in population health metrics, reinforcing the need for long-term planning in public health strategies.
The results also align well with potential research questions. Firstly, key predictors for each outcome were identified, showing that health outcomes are driven by a mix of socio-economic, demographic, and lagged health indicators. Secondly, the importance of lagged features strongly supports the hypothesis that delayed effects exist and can be captured through temporal modeling. Lastly, certain predictors such as BMI_avg_lag3, income, and child mortality prove robust across multiple selection methods, confirming their consistent relevance.
In summary, this feature importance analysis not only highlights the leading drivers of life expectancy, diabetes, and cardiovascular diseases but also reveals the critical role of historical data in shaping current health outcomes. These insights provide valuable guidance for public health planning, suggesting that investments in early-life health, economic accessibility, and education can yield significant long-term benefits across diverse health indicators.
Feature Importance Plot¶
# Bar Plot top features per target based on combined score
# === Custom color map
custom_colors = {
'Cardiovascular diseases': 'mediumseagreen',
'Diabetes': 'darkorange',
'Life expectancy': 'cornflowerblue'
}
targets = ['Cardiovascular diseases', 'Diabetes', 'Life expectancy']
# === Plot top features per target with color
top_n = 10 # adjust as needed
for target in targets:
combined_col = (target, 'Combined')
df_top = multi_method_importance[[combined_col]].copy()
df_top.columns = ['Combined Score']
df_top = df_top.sort_values(by='Combined Score', ascending=False).head(top_n)
plt.figure(figsize=(10, 6))
sns.barplot(x='Combined Score', y=df_top.index, data=df_top, color=custom_colors[target])
plt.title(f"Top {top_n} Features for {target} (Combined Score)", fontsize=14)
plt.xlabel("Combined Importance")
plt.ylabel("Feature")
plt.tight_layout()
plt.show()
The Augmented Dickey-Fuller (ADF) test results provide critical insight into the time series characteristics of the three key health outcomes in this study: life expectancy, diabetes, and cardiovascular diseases. All three variables demonstrate strong stationarity, as indicated by highly negative ADF statistics (e.g., -18.54 for life expectancy, -12.63 for cardiovascular diseases, and -12.32 for diabetes) and extremely low p-values (all near zero). These values are well below conventional significance thresholds (0.01 or 0.05), confirming that the time series are stationary—that is, their statistical properties such as mean and variance remain stable over time.rmance over time.
ACF and PACF plot¶
ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots are visual tools used to analyze the correlation structure of time series data. They help identify patterns and dependencies between data points at different lags (time intervals) and are crucial for determining appropriate models for time series forecasting, particularly AR (Autoregressive) and MA (Moving Average) models.
# ACF and PACF plot
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Assuming df_lagged is your dataset and contains time-series data
target_cols = ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']
for target in target_cols:
series = df_lagged[target].dropna()
fig, ax = plt.subplots(2, 1, figsize=(10, 8), sharex=True)
fig.suptitle(f'ACF and PACF for {target}', fontsize=16)
plot_acf(series, lags=40, ax=ax[0])
ax[0].set_title(f'Autocorrelation (ACF) - {target}')
ax[0].set_ylabel('ACF')
plot_pacf(series, lags=40, ax=ax[1], method='ywm')
ax[1].set_title(f'Partial Autocorrelation (PACF) - {target}')
ax[1].set_ylabel('PACF')
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
Based on the plot of ACF and PACF, for all targets on both the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots show significant spikes that gradually decay over time, it strongly suggests that the series contains autocorrelation — meaning past values have a measurable influence on future ones. This is particularly common in time-series data with memory or lag effects. The ACF’s slow decay pattern indicates that a moving average (MA) component may be present, while the PACF plot with very strong spikes at the first two lags points to a likely autoregressive structure of order two, also known as AR(2). In this case, the series is heavily influenced by its own values from one and two time steps prior. Together, these patterns imply that an ARIMA model would be a suitable fit, specifically one with parameters ARIMA(2, 0, q), where "p = 2" captures the autoregressive lags, "d = 0" reflects the fact that the series is stationary (as shown in the ADF test), and "q" is chosen based on how far the ACF continues to show significant autocorrelation. These insights are instrumental in designing lag-based features or selecting model architectures that are sensitive to temporal dynamics, such as ARIMA, SARIMA, or even recurrent neural networks.
In this forecasting project, three lags (lag1, lag2, and lag3) were chosen for time-dependent predictors based on both statistical diagnostics and inference validation. The Partial Autocorrelation Function (PACF) plots consistently showed strong spikes at lag 1 and lag 2, with a noticeable flattening from lag 3 onward. This suggested an autoregressive structure primarily governed by the first two time steps. However, subsequent HAC-corrected regression analysis revealed that certain lag3 features, including economic and health indicators, eg. CPI, BMI, Inflation and Income etc. were still statistically significant (p < 0.05), confirming their meaningful contribution despite weaker autocorrelation beyond lag 2. By including lag3 in the modeling framework alongside lag1 and lag2, the models captured short-term memory effects while allowing for delayed impacts that are often present in real-world socioeconomic dynamics. This decision ensures a balance between temporal relevance and statistical validity, strengthening both the explanatory power and forecasting accuracy of the models.
Residual diagnostics (heteroscedasticity, autocorrelation)¶
Residual diagnostics and the ADF (Augmented Dickey-Fuller) test are important tools in time series modeling that help ensure the models are valid, interpretable, and produce reliable forecasts.
Residual diagnostics involve analyzing the residuals means the differences between the actual values and the predicted values from your model. These diagnostics test whether your model assumptions hold, particularly in regression or forecasting models. For example, the Breusch-Pagan test checks for heteroscedasticity, which is when the variance of residuals is not constant over time. Constant variance is a key assumption in linear regression; if violated, it can lead to inefficient or biased estimates. Similarly, the Ljung-Box test assesses whether residuals are autocorrelated, which means they are correlated across time. If residuals show autocorrelation, your model has likely failed to capture some time-based structure in the data, indicating the model is underfitting or misspecified. Performing these diagnostics ensures that your model is statistically sound and that the insights or forecasts it provides are trustworthy.
# Residual Diagnostics - Test and Summary Table - REVISED
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_breuschpagan
from statsmodels.stats.diagnostic import acorr_ljungbox
import pandas as pd
import numpy as np
from tabulate import tabulate
from google.colab import files
# === Function: Residual Diagnostics for one target ===
def residual_diagnostics(X, y):
data = pd.concat([X, y], axis=1).dropna()
X_cleaned = data[X.columns]
y_cleaned = data[y.name]
X_const = sm.add_constant(X_cleaned)
model = sm.OLS(y_cleaned, X_const).fit()
residuals = model.resid
# Breusch-Pagan Test
bp_test = het_breuschpagan(residuals, X_const.loc[residuals.index])
bp_labels = ['LM stat', 'BP p-value', 'BP f-value', 'BP f p-value']
bp_results = dict(zip(bp_labels, bp_test))
# Ljung-Box Test
if len(residuals) > 10:
lb_test = acorr_ljungbox(residuals, lags=[10], return_df=True)
lb_pvalue = lb_test['lb_pvalue'].iloc[0]
else:
lb_pvalue = "Insufficient data (n < 10)"
return bp_results, lb_pvalue, residuals
# === Setup
target_cols = ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']
diagnostics_summary = []
if 'df_lagged' in locals():
for target_col in target_cols:
print(f"\n Running diagnostics for: {target_col}")
if target_col not in df_lagged.columns:
print(f" Skipping {target_col} — not found in df_lagged")
continue
y = df_lagged[target_col]
X = df_lagged.drop(columns=target_cols, errors='ignore')
data = pd.concat([X, y], axis=1).dropna()
if data.empty:
print(" Not enough data after dropping NaNs")
continue
bp_results, lb_pvalue, residuals = residual_diagnostics(X, y)
diagnostics_summary.append({
"Target": target_col,
"Breusch-Pagan LM stat": round(bp_results['LM stat'], 4),
"BP p-value": round(bp_results['BP p-value'], 4),
"BP f-value": round(bp_results['BP f-value'], 4),
"BP f p-value": round(bp_results['BP f p-value'], 4),
"Ljung-Box p-value (lag=10)": lb_pvalue,
"Residual Mean": round(residuals.mean(), 4),
"Residual Variance": round(residuals.var(), 4)
})
# === Create summary table
diagnostics_df = pd.DataFrame(diagnostics_summary)
# Print as fancy table
print("\n📋 Residual Diagnostics Summary:")
print(tabulate(diagnostics_df, headers='keys', tablefmt='fancy_grid', showindex=False))
# === Export to CSV
filename = "residual_diagnostics_summary.csv"
diagnostics_df.to_csv(filename, index=False)
# Download the file (include full path)
from google.colab import files
files.download(filename)
else:
print("❗ df_lagged is not defined. Please run your preprocessing cell first.")
Running diagnostics for: Life expectancy Running diagnostics for: Cardiovascular diseases Running diagnostics for: Diabetes 📋 Residual Diagnostics Summary: ╒═════════════════════════╤═════════════════════════╤══════════════╤══════════════╤════════════════╤══════════════════════════════╤═════════════════╤═════════════════════╕ │ Target │ Breusch-Pagan LM stat │ BP p-value │ BP f-value │ BP f p-value │ Ljung-Box p-value (lag=10) │ Residual Mean │ Residual Variance │ ╞═════════════════════════╪═════════════════════════╪══════════════╪══════════════╪════════════════╪══════════════════════════════╪═════════════════╪═════════════════════╡ │ Life expectancy │ 1054.96 │ 0 │ 23.3712 │ 0 │ 0 │ 0 │ 11.3211 │ ├─────────────────────────┼─────────────────────────┼──────────────┼──────────────┼────────────────┼──────────────────────────────┼─────────────────┼─────────────────────┤ │ Cardiovascular diseases │ 423.071 │ 0 │ 9.0137 │ 0 │ 0 │ 0 │ 19607.4 │ ├─────────────────────────┼─────────────────────────┼──────────────┼──────────────┼────────────────┼──────────────────────────────┼─────────────────┼─────────────────────┤ │ Diabetes │ 1517.67 │ 0 │ 34.6315 │ 0 │ 0 │ 0 │ 9.8512 │ ╘═════════════════════════╧═════════════════════════╧══════════════╧══════════════╧════════════════╧══════════════════════════════╧═════════════════╧═════════════════════╛
For life expectancy, the model displays significant signs of both heteroscedasticity and autocorrelation. The Breusch-Pagan test results show extremely low p-values, indicating that the variance of residuals is not constant and may vary depending on specific predictor values. This suggests that the linear model may be missing key nonlinear components or interaction terms that could stabilize prediction behavior. Additionally, the Ljung-Box test reveals strong autocorrelation at lag 10, meaning past errors are influencing current ones — a sign that temporal patterns are not fully addressed. While the mean residual is centered at zero, which reflects no bias, the residual variance of 11.32 indicates moderate inconsistency in prediction accuracy across observations.
For diabetes, the residual profile reveals similar issues. The Breusch-Pagan test indicates pronounced heteroscedasticity, reinforcing the idea that predictor influence changes across the prediction space, particularly among metabolic or demographic variables like BMI and age. Autocorrelation is again significant according to the Ljung-Box test, implying model limitations in capturing lagged or sequential health dynamics. Although the mean residual is virtually zero — a good sign for bias — the variance of 9.85 suggests moderate prediction error dispersion, warranting further refinement in feature interaction or time-aware modeling.
The cardiovascular diseases model also shows clear heteroscedasticity, as highlighted by the Breusch-Pagan results with very low p-values. Autocorrelation is present, which points to time-based dependencies not fully captured in the linear framework. Most strikingly, the residual variance is extremely high at 19,607.45, hinting at either model instability, data skewness, or presence of outliers that are drastically affecting performance. Despite having a neutral residual mean, the model appears highly sensitive to certain predictors and may benefit from robust regression techniques or transformations to control volatility.
Overall, all three models demonstrate residual patterns that suggest issues with non-constant variance and temporal correlation. These findings recommend considering more flexible approaches such as time-series models, generalized least squares, or regression techniques that accommodate heteroscedasticity and autocorrelation directly. Enhancing each model to better capture nonlinearities or lag structures could meaningfully improve predictive reliability and interpretability.
According the result of Residual Diagnostics indicate that the model's residuals exhibit both heteroscedasticity and autocorrelation, which violate the assumption of constant variance and independence of residuals ordinary least squares (OLS) regression.
To solve this problem by using robust standard errors (Heteroskedasticity-Autocorrelation Consistent or HAC standard errors) that account for both heteroscedasticity and autocorrelation in the variance-covariance matrix.
HAC corrected standard errors (like from Newey-West estimator) adjust the model's coefficient uncertainty when residuals are non-constant and correlated across time. It doesn't change the point estimates, but it makes the statistical tests more reliable — especially t-values, p-values, and confidence intervals.
# HAC REVISED
import statsmodels.api as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Target columns
target_cols = ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']
# Iterate through each target
for target in target_cols:
print(f"\n=== Newey-West Adjusted OLS Results for: {target} ===")
try:
# Define target and predictors
y = df_lagged[target]
all_target_cols_in_df = [col for col in target_cols if col in df_lagged.columns]
X = df_lagged.drop(columns=all_target_cols_in_df, errors='ignore')
# Combine and clean
data = pd.concat([X, y], axis=1).dropna()
X_cleaned = data[X.columns]
y_cleaned = data[y.name]
# Add constant term
X_const = sm.add_constant(X_cleaned)
# Fit OLS
model = sm.OLS(y_cleaned, X_const).fit()
# Newey-West HAC adjustment
nobs = len(y_cleaned)
maxlags = min(5, nobs - 1)
nw_model = model.get_robustcov_results(cov_type='HAC', maxlags=maxlags)
# Print summary
print(nw_model.summary())
# Residual analysis
residuals = nw_model.resid
print(f"\n📊 Residuals Summary for '{target}':")
print(pd.Series(residuals).describe())
if len(residuals) > 0 and not np.all(residuals == 0):
plt.figure(figsize=(10, 4))
plt.plot(range(len(residuals)), residuals, color='darkblue', linewidth=1)
plt.axhline(0, color='gray', linestyle='--')
plt.title(f"Residuals Over Time — {target}", fontsize=14)
plt.xlabel("Observation Index")
plt.ylabel("Residual")
plt.grid(True)
plt.tight_layout()
plt.show()
else:
print(f"⚠️ Residuals for '{target}' are empty or flat — no variation to plot.")
except Exception as e:
print(f"❌ Could not fit HAC model or plot residuals for {target}: {e}")
# export and download file
best_performance_df.to_csv("best_feature_selection_summary.csv", index=False)
from google.colab import files
files.download("best_feature_selection_summary.csv")
=== Newey-West Adjusted OLS Results for: Life expectancy ===
OLS Regression Results
==============================================================================
Dep. Variable: Life expectancy R-squared: 0.918
Model: OLS Adj. R-squared: 0.917
Method: Least Squares F-statistic: 1.278
Date: Thu, 24 Jul 2025 Prob (F-statistic): 0.276
Time: 19:55:10 Log-Likelihood: -44559.
No. Observations: 16928 AIC: 8.922e+04
Df Residuals: 16879 BIC: 8.959e+04
Df Model: 48
Covariance Type: HAC
======================================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------------------------
const -1.387e+08 nan nan nan nan nan
Cost of a healthy diet -5.6642 2.898 -1.955 0.051 -11.344 0.016
Income -1.2168 0.285 -4.270 0.000 -1.775 -0.658
Inflation 0.1587 0.067 2.372 0.018 0.028 0.290
Child mortality rate -26.6933 1.192 -22.396 0.000 -29.029 -24.357
Unemployment Rate -0.2086 0.297 -0.702 0.483 -0.791 0.374
Incomplete tertiary education 0.2633 0.131 2.003 0.045 0.006 0.521
Gini coefficient 12.2883 15.458 0.795 0.427 -18.011 42.588
Sex ratio 2.196e+08 4.07e+09 0.054 0.957 -7.77e+09 8.2e+09
GDP 0.1197 0.081 1.469 0.142 -0.040 0.279
Median age 502.6744 576.528 0.872 0.383 -627.380 1632.729
CPI -0.0719 0.024 -3.044 0.002 -0.118 -0.026
BMI_avg -0.0414 0.039 -1.073 0.283 -0.117 0.034
Cost of a healthy diet_lag1 0.5269 1.822 0.289 0.772 -3.044 4.098
Cost of a healthy diet_lag2 -0.3998 2.777 -0.144 0.886 -5.843 5.043
Cost of a healthy diet_lag3 4.8015 3.009 1.596 0.111 -1.096 10.699
Income_lag1 0.1039 0.201 0.518 0.605 -0.289 0.497
Income_lag2 0.0726 0.244 0.297 0.767 -0.407 0.552
Income_lag3 1.1014 0.325 3.393 0.001 0.465 1.738
Inflation_lag1 0.0803 0.042 1.923 0.054 -0.002 0.162
Inflation_lag2 0.0255 0.042 0.612 0.541 -0.056 0.107
Inflation_lag3 -0.1654 0.077 -2.155 0.031 -0.316 -0.015
Child mortality rate_lag1 0.0505 1.106 0.046 0.964 -2.116 2.217
Child mortality rate_lag2 3.3981 1.241 2.739 0.006 0.966 5.830
Child mortality rate_lag3 10.0025 nan nan nan nan nan
Unemployment Rate_lag1 -0.3134 0.225 -1.396 0.163 -0.754 0.127
Unemployment Rate_lag2 0.5150 0.240 2.142 0.032 0.044 0.986
Unemployment Rate_lag3 -0.1186 0.281 -0.422 0.673 -0.669 0.432
Incomplete tertiary education_lag1 -0.0128 0.125 -0.103 0.918 -0.257 0.231
Incomplete tertiary education_lag2 0.0864 0.159 0.544 0.587 -0.225 0.398
Incomplete tertiary education_lag3 -0.3514 0.200 -1.758 0.079 -0.743 0.040
Gini coefficient_lag1 4.2493 8.833 0.481 0.630 -13.064 21.563
Gini coefficient_lag2 -2.9126 8.570 -0.340 0.734 -19.711 13.886
Gini coefficient_lag3 -18.9249 9.612 -1.969 0.049 -37.765 -0.085
Sex ratio_lag1 4.569e+07 8.87e+09 0.005 0.996 -1.73e+10 1.74e+10
Sex ratio_lag2 1.123e+08 7.4e+09 0.015 0.988 -1.44e+10 1.46e+10
Sex ratio_lag3 1.381e+08 1.97e+09 0.070 0.944 -3.73e+09 4.01e+09
GDP_lag1 0.0178 0.077 0.231 0.817 -0.133 0.169
GDP_lag2 -0.0076 0.025 -0.310 0.757 -0.056 0.041
GDP_lag3 -0.0259 0.051 -0.508 0.611 -0.126 0.074
Median age_lag1 -618.4803 1131.103 -0.547 0.585 -2835.561 1598.600
Median age_lag2 -418.3976 1146.221 -0.365 0.715 -2665.111 1828.316
Median age_lag3 503.8654 599.070 0.841 0.400 -670.375 1678.105
CPI_lag1 -0.0128 0.017 -0.771 0.441 -0.045 0.020
CPI_lag2 0.0047 0.015 0.317 0.751 -0.024 0.034
CPI_lag3 0.0772 0.016 4.783 0.000 0.046 0.109
BMI_avg_lag1 -0.0056 0.026 -0.218 0.828 -0.056 0.045
BMI_avg_lag2 -0.0167 0.028 -0.585 0.558 -0.072 0.039
BMI_avg_lag3 0.0151 0.041 0.365 0.715 -0.066 0.096
==============================================================================
Omnibus: 2958.623 Durbin-Watson: 0.136
Prob(Omnibus): 0.000 Jarque-Bera (JB): 10571.972
Skew: -0.860 Prob(JB): 0.00
Kurtosis: 6.468 Cond. No. 6.74e+11
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 5 lags and without small sample correction
[2] The smallest eigenvalue is 2.9e-16. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
📊 Residuals Summary for 'Life expectancy':
count 16928.000000
mean 0.000003
std 3.364682
min -32.524090
25% -1.963268
50% 0.267816
75% 2.256538
max 15.502762
dtype: float64
/usr/local/lib/python3.11/dist-packages/statsmodels/base/model.py:1894: ValueWarning: covariance of constraints does not have full rank. The number of constraints is 48, but rank is 4
warnings.warn('covariance of constraints does not have full '
/usr/local/lib/python3.11/dist-packages/statsmodels/regression/linear_model.py:1884: RuntimeWarning: invalid value encountered in sqrt
return np.sqrt(np.diag(self.cov_params()))
=== Newey-West Adjusted OLS Results for: Cardiovascular diseases ===
OLS Regression Results
===================================================================================
Dep. Variable: Cardiovascular diseases R-squared: 0.043
Model: OLS Adj. R-squared: 0.040
Method: Least Squares F-statistic: 0.01330
Date: Thu, 24 Jul 2025 Prob (F-statistic): 1.00
Time: 19:55:10 Log-Likelihood: -1.0767e+05
No. Observations: 16928 AIC: 2.154e+05
Df Residuals: 16879 BIC: 2.158e+05
Df Model: 48
Covariance Type: HAC
======================================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------------------------
const -4.323e+09 2.21e+09 -1.955 0.051 -8.66e+09 1.02e+07
Cost of a healthy diet 12.5572 32.502 0.386 0.699 -51.151 76.265
Income 21.3556 17.251 1.238 0.216 -12.457 55.169
Inflation 1.1911 2.883 0.413 0.679 -4.460 6.842
Child mortality rate -1.6502 55.220 -0.030 0.976 -109.888 106.588
Unemployment Rate 16.5400 9.638 1.716 0.086 -2.352 35.432
Incomplete tertiary education -5.2535 7.608 -0.691 0.490 -20.166 9.659
Gini coefficient 160.8072 595.017 0.270 0.787 -1005.488 1327.102
Sex ratio 7.411e+09 1.32e+11 0.056 0.955 -2.51e+11 2.66e+11
GDP 3.3278 2.869 1.160 0.246 -2.295 8.950
Median age 1097.7356 1.82e+04 0.060 0.952 -3.45e+04 3.67e+04
CPI 0.4940 1.274 0.388 0.698 -2.002 2.990
BMI_avg -1.7352 1.782 -0.974 0.330 -5.228 1.757
Cost of a healthy diet_lag1 9.9738 nan nan nan nan nan
Cost of a healthy diet_lag2 11.7875 68.555 0.172 0.863 -122.587 146.162
Cost of a healthy diet_lag3 1.1933 37.275 0.032 0.974 -71.869 74.256
Income_lag1 0.4962 8.482 0.058 0.953 -16.130 17.123
Income_lag2 13.8650 19.042 0.728 0.467 -23.459 51.189
Income_lag3 -12.2323 13.529 -0.904 0.366 -38.751 14.287
Inflation_lag1 -0.4568 1.486 -0.307 0.759 -3.370 2.457
Inflation_lag2 -0.6003 1.574 -0.381 0.703 -3.686 2.485
Inflation_lag3 -1.7078 3.475 -0.491 0.623 -8.519 5.103
Child mortality rate_lag1 1.6709 nan nan nan nan nan
Child mortality rate_lag2 2.8105 nan nan nan nan nan
Child mortality rate_lag3 8.4775 17.638 0.481 0.631 -26.096 43.051
Unemployment Rate_lag1 -0.7751 2.350 -0.330 0.742 -5.382 3.832
Unemployment Rate_lag2 0.5853 0.323 1.811 0.070 -0.048 1.219
Unemployment Rate_lag3 -5.4142 5.114 -1.059 0.290 -15.438 4.609
Incomplete tertiary education_lag1 -0.0505 4.402 -0.011 0.991 -8.679 8.578
Incomplete tertiary education_lag2 0.4994 1.740 0.287 0.774 -2.911 3.909
Incomplete tertiary education_lag3 -2.5220 7.285 -0.346 0.729 -16.801 11.757
Gini coefficient_lag1 50.2384 134.988 0.372 0.710 -214.351 314.828
Gini coefficient_lag2 197.9054 214.176 0.924 0.355 -221.902 617.712
Gini coefficient_lag3 -761.0729 290.835 -2.617 0.009 -1331.140 -191.006
Sex ratio_lag1 1.369e+09 nan nan nan nan nan
Sex ratio_lag2 1.918e+09 nan nan nan nan nan
Sex ratio_lag3 5.383e+09 nan nan nan nan nan
GDP_lag1 0.2412 nan nan nan nan nan
GDP_lag2 -0.0958 nan nan nan nan nan
GDP_lag3 1.1461 2.557 0.448 0.654 -3.867 6.159
Median age_lag1 -1056.8683 3.13e+04 -0.034 0.973 -6.24e+04 6.03e+04
Median age_lag2 5836.9081 nan nan nan nan nan
Median age_lag3 -3728.3101 nan nan nan nan nan
CPI_lag1 0.0970 0.685 0.142 0.887 -1.247 1.441
CPI_lag2 0.1355 0.706 0.192 0.848 -1.248 1.519
CPI_lag3 1.4261 1.293 1.103 0.270 -1.108 3.961
BMI_avg_lag1 -0.1263 0.903 -0.140 0.889 -1.896 1.643
BMI_avg_lag2 -0.0982 0.921 -0.107 0.915 -1.904 1.708
BMI_avg_lag3 -3.5173 1.732 -2.031 0.042 -6.912 -0.123
==============================================================================
Omnibus: 26274.800 Durbin-Watson: 0.029
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9869872.713
Skew: 10.133 Prob(JB): 0.00
Kurtosis: 119.544 Cond. No. 6.74e+11
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 5 lags and without small sample correction
[2] The smallest eigenvalue is 2.9e-16. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
📊 Residuals Summary for 'Cardiovascular diseases':
count 16928.000000
mean 0.000031
std 140.026594
min -114.217116
25% -36.951825
50% -17.545906
75% 2.043523
max 1848.846904
dtype: float64
/usr/local/lib/python3.11/dist-packages/statsmodels/base/model.py:1894: ValueWarning: covariance of constraints does not have full rank. The number of constraints is 48, but rank is 4
warnings.warn('covariance of constraints does not have full '
/usr/local/lib/python3.11/dist-packages/statsmodels/regression/linear_model.py:1884: RuntimeWarning: invalid value encountered in sqrt
return np.sqrt(np.diag(self.cov_params()))
=== Newey-West Adjusted OLS Results for: Diabetes ===
OLS Regression Results
==============================================================================
Dep. Variable: Diabetes R-squared: 0.539
Model: OLS Adj. R-squared: 0.537
Method: Least Squares F-statistic: 2.167
Date: Thu, 24 Jul 2025 Prob (F-statistic): 0.0700
Time: 19:55:11 Log-Likelihood: -43381.
No. Observations: 16928 AIC: 8.686e+04
Df Residuals: 16879 BIC: 8.724e+04
Df Model: 48
Covariance Type: HAC
======================================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------------------------
const -1.294e+08 nan nan nan nan nan
Cost of a healthy diet 6.7514 1.664 4.058 0.000 3.490 10.013
Income -1.0163 0.277 -3.675 0.000 -1.558 -0.474
Inflation -0.1576 0.059 -2.650 0.008 -0.274 -0.041
Child mortality rate -0.1721 0.736 -0.234 0.815 -1.616 1.271
Unemployment Rate 0.0279 0.183 0.152 0.879 -0.331 0.387
Incomplete tertiary education -0.2060 0.048 -4.335 0.000 -0.299 -0.113
Gini coefficient -11.5868 nan nan nan nan nan
Sex ratio 1.678e+08 3.41e+09 0.049 0.961 -6.52e+09 6.85e+09
GDP -0.1475 0.030 -4.868 0.000 -0.207 -0.088
Median age 503.8324 343.204 1.468 0.142 -168.883 1176.548
CPI 0.0779 0.022 3.592 0.000 0.035 0.120
BMI_avg -0.0456 0.057 -0.799 0.424 -0.157 0.066
Cost of a healthy diet_lag1 0.0382 0.929 0.041 0.967 -1.783 1.860
Cost of a healthy diet_lag2 0.8687 1.825 0.476 0.634 -2.708 4.446
Cost of a healthy diet_lag3 0.7153 2.080 0.344 0.731 -3.362 4.792
Income_lag1 -0.2537 0.185 -1.369 0.171 -0.617 0.110
Income_lag2 -0.0917 0.262 -0.350 0.726 -0.606 0.422
Income_lag3 -0.5161 0.281 -1.835 0.067 -1.067 0.035
Inflation_lag1 -0.0909 0.038 -2.403 0.016 -0.165 -0.017
Inflation_lag2 -0.0971 0.044 -2.215 0.027 -0.183 -0.011
Inflation_lag3 -0.1121 0.049 -2.288 0.022 -0.208 -0.016
Child mortality rate_lag1 -0.3867 0.523 -0.739 0.460 -1.412 0.639
Child mortality rate_lag2 0.0770 0.278 0.277 0.782 -0.469 0.623
Child mortality rate_lag3 0.6715 0.251 2.675 0.007 0.179 1.164
Unemployment Rate_lag1 -0.0285 0.107 -0.268 0.789 -0.237 0.180
Unemployment Rate_lag2 -0.0775 0.092 -0.845 0.398 -0.257 0.102
Unemployment Rate_lag3 -0.1296 0.188 -0.689 0.491 -0.498 0.239
Incomplete tertiary education_lag1 0.0421 0.092 0.457 0.648 -0.138 0.223
Incomplete tertiary education_lag2 -0.0203 0.120 -0.169 0.866 -0.256 0.216
Incomplete tertiary education_lag3 -0.1239 0.153 -0.811 0.417 -0.423 0.176
Gini coefficient_lag1 -7.6403 7.914 -0.965 0.334 -23.153 7.873
Gini coefficient_lag2 -1.8097 5.717 -0.317 0.752 -13.015 9.395
Gini coefficient_lag3 1.5183 8.841 0.172 0.864 -15.810 18.847
Sex ratio_lag1 1.984e+07 5.37e+09 0.004 0.997 -1.05e+10 1.05e+10
Sex ratio_lag2 4.239e+07 nan nan nan nan nan
Sex ratio_lag3 2.511e+08 nan nan nan nan nan
GDP_lag1 -0.0006 0.052 -0.012 0.990 -0.103 0.102
GDP_lag2 -0.0003 nan nan nan nan nan
GDP_lag3 -0.0483 0.041 -1.165 0.244 -0.130 0.033
Median age_lag1 -718.9570 695.114 -1.034 0.301 -2081.454 643.539
Median age_lag2 1089.5169 611.331 1.782 0.075 -108.757 2287.790
Median age_lag3 -996.5964 382.339 -2.607 0.009 -1746.020 -247.173
CPI_lag1 -0.0030 0.013 -0.234 0.815 -0.028 0.022
CPI_lag2 -0.0011 0.014 -0.080 0.937 -0.028 0.026
CPI_lag3 0.0204 0.023 0.882 0.378 -0.025 0.066
BMI_avg_lag1 -0.0006 0.031 -0.021 0.984 -0.060 0.059
BMI_avg_lag2 -0.0014 0.034 -0.042 0.967 -0.068 0.065
BMI_avg_lag3 0.9702 0.056 17.417 0.000 0.861 1.079
==============================================================================
Omnibus: 4120.296 Durbin-Watson: 0.108
Prob(Omnibus): 0.000 Jarque-Bera (JB): 12682.365
Skew: 1.250 Prob(JB): 0.00
Kurtosis: 6.425 Cond. No. 6.74e+11
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 5 lags and without small sample correction
[2] The smallest eigenvalue is 2.9e-16. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
📊 Residuals Summary for 'Diabetes':
count 16928.000000
mean 0.000001
std 3.138661
min -9.906907
25% -2.044869
50% -0.612431
75% 1.408433
max 24.396272
dtype: float64
/usr/local/lib/python3.11/dist-packages/statsmodels/base/model.py:1894: ValueWarning: covariance of constraints does not have full rank. The number of constraints is 48, but rank is 4
warnings.warn('covariance of constraints does not have full '
/usr/local/lib/python3.11/dist-packages/statsmodels/regression/linear_model.py:1884: RuntimeWarning: invalid value encountered in sqrt
return np.sqrt(np.diag(self.cov_params()))
Result of HAC:
The OLS regression summary reveals varying levels of model performance across the three health outcomes. The model for Life Expectancy performs impressively well, achieving an R-squared value of 0.918 and an adjusted R-squared of 0.917. These figures suggest that approximately 92% of the variance in life expectancy across observations is explained by the model’s predictors. Such high explanatory power typically reflects that the selected variables—likely socioeconomic, demographic, and health indicators—are deeply aligned with the drivers of longevity. However, despite the strong fit, the F-statistic is relatively low (1.278) and its p-value (0.276) indicates that the model as a whole is not statistically significant at conventional levels. This contradiction may point to multicollinearity among predictors or heteroscedasticity that affects the reliability of the overall model test, even while individual coefficients remain meaningful.
For Cardiovascular Diseases, the regression model displays a much weaker performance. The R-squared is only 0.043, suggesting that the predictors explain just 4.3% of the variation in cardiovascular disease prevalence. The adjusted R-squared is nearly identical at 0.040, further confirming the low explanatory power. The F-statistic is close to zero (0.0133) and the p-value is 1.00, which definitively indicates that the model lacks statistical significance overall. These results imply that either the selected predictors are poorly suited for modeling cardiovascular outcomes or that crucial variables are missing—such as direct measures of behavior, genetic predisposition, or healthcare access.
The Diabetes model performs moderately well. An R-squared of 0.539 and adjusted R-squared of 0.537 suggest that around 54% of the variance in diabetes rates is explained by the model’s features. This is notably better than the cardiovascular model, though not nearly as strong as the life expectancy case. The F-statistic of 2.167 implies some model-wide explanatory power, and the p-value (0.070) teeters just above conventional thresholds for significance. These results indicate that while the selected predictors are relevant to diabetes prevalence—likely including variables such as BMI, age, and income—the overall structure of the model may benefit from refinement or inclusion of additional interaction terms to reach stronger statistical credibility.
# Stable Predictors Bar Charts
import matplotlib.pyplot as plt
import seaborn as sns
# Convert p-values to float for plotting
stable_summary_df['p-value'] = stable_summary_df['p-value < 0.05'].astype(float)
# Set plot style
sns.set(style="whitegrid")
# Get list of unique targets
targets = stable_summary_df['Target'].unique()
# Create one bar chart per target
for target in targets:
plt.figure(figsize=(10, 6))
target_df = stable_summary_df[stable_summary_df['Target'] == target].copy()
# Sort by p-value (lowest = most significant)
target_df = target_df.sort_values('p-value', ascending=True)
# Barplot of -log10(p-value) for visibility
sns.barplot(
data=target_df,
x=-np.log10(target_df['p-value']), # higher bar = more significant
y='Stable Predictor',
palette='viridis'
)
plt.title(f"Stable Predictors for {target} (p < 0.05)", fontsize=14)
plt.xlabel('-log10(p-value)')
plt.ylabel('Predictor')
plt.tight_layout()
plt.show()
/tmp/ipython-input-21-1853306393.py:24: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.barplot(
/tmp/ipython-input-21-1853306393.py:24: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.barplot(
/tmp/ipython-input-21-1853306393.py:24: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.barplot(
Forecast Comparison of ARIMA, Prophet, and Random Forest Across Three Targets¶
# Forecast Comparison of ARIMA, Prophet, and Random Forest Across Three Targets – Life Expectancy, Diabetes and Cardiovascular Disease between 1950–2074
import matplotlib.pyplot as plt
import numpy as np
def plot_target_forecast(df_model_all, df_eval_ready, country, target):
# Years
full_years = list(range(1950, 2075))
eval_years = [2021, 2022, 2023]
forecast_start, forecast_end = 2024, 2074
# === Actual values for 1950–2023
df_actual = df_eval_ready[
(df_eval_ready['Country'] == country) &
(df_eval_ready['Year'].between(1950, 2023))
].sort_values('Year')
actual_years = df_actual['Year'].values
actual_vals = df_actual[target].values
# === Predictions from df_model_comparison for 2021–2074
df_pred = df_model_all[
(df_model_all['Country'] == country) &
(df_model_all['Target'] == target) &
(df_model_all['Year'].between(2021, 2074))
].sort_values('Year')
pred_years = df_pred['Year'].values
rf_vals = df_pred['RF_Forecast'].values
arima_vals = df_pred['ARIMA_Forecast'].values
prophet_vals = df_pred['Prophet_Forecast'].values
# Split prediction into eval + forecast ranges
rf_eval, rf_forecast = [], []
arima_eval, arima_forecast = [], []
prophet_eval, prophet_forecast = [], []
for yr, rf, ar, pr in zip(pred_years, rf_vals, arima_vals, prophet_vals):
if yr in eval_years:
rf_eval.append((yr, rf))
arima_eval.append((yr, ar))
prophet_eval.append((yr, pr))
else:
rf_forecast.append((yr, rf))
arima_forecast.append((yr, ar))
prophet_forecast.append((yr, pr))
# Begin plot
plt.figure(figsize=(14, 6))
# Shaded forecast area
plt.axvspan(forecast_start, forecast_end, color='gray', alpha=0.12, label="Forecast Horizon")
# Actual line
plt.plot(actual_years, actual_vals, label="Actual", color='orange', linewidth=2)
# Prediction lines (2021–2023)
if rf_eval: plt.plot(*zip(*rf_eval), label="RF Eval", color='dodgerblue', linestyle='dashed', linewidth=2)
if arima_eval: plt.plot(*zip(*arima_eval), label="ARIMA Eval", color='forestgreen', linestyle='dashed', linewidth=2)
if prophet_eval: plt.plot(*zip(*prophet_eval), label="Prophet Eval", color='darkorchid', linestyle='dashed', linewidth=2)
# Forecast lines (2024–2074)
if rf_forecast: plt.plot(*zip(*rf_forecast), label="RF Forecast", color='dodgerblue', linewidth=2)
if arima_forecast: plt.plot(*zip(*arima_forecast), label="ARIMA Forecast", color='forestgreen', linewidth=2)
if prophet_forecast: plt.plot(*zip(*prophet_forecast), label="Prophet Forecast", color='darkorchid', linewidth=2)
# Final plot touches
plt.title(f"{target} — Actual, Evaluation & Forecast Comparison ({country})", fontsize=16)
plt.xlabel("Year")
plt.ylabel("Value")
plt.grid(True)
plt.legend()
plt.xlim(1950, 2074)
plt.tight_layout()
plt.show()
selected_countries = [
'United States', 'Germany', 'Japan', 'Brazil', 'India',
'Indonesia', 'Nigeria', 'Kenya', 'Mexico', 'Bangladesh'
]
selected_targets = ["Life expectancy", "Diabetes", "Cardiovascular diseases"]
for country in selected_countries:
for target in selected_targets:
plot_target_forecast(df_model_comparison, df_forecast_ready, country, target)
# Plot comparison for 4 countries
import matplotlib.pyplot as plt
import seaborn as sns
# Countries and years to plot
countries_to_plot = ['United States', 'Mexico', 'India', 'Japan']
years_to_plot = [2021, 2022, 2023]
# combine all result into one dataframe
val_results = pd.concat([
pd.concat(arima_val_all, ignore_index=True),
pd.concat(prophet_val_all, ignore_index=True),
pd.concat(rf_val_all, ignore_index=True)
], ignore_index=True)
val_results['Model'] = val_results['Model'].replace({'RandomForest': 'Random Forest'})
# Filter validation results for these countries and years
plot_df = val_results[
(val_results['Country'].isin(countries_to_plot)) &
(val_results['Year'].isin(years_to_plot))
].copy()
# Example for one target variable, say target = 'Cardiovascular diseases'
target_of_interest = 'Cardiovascular diseases'
plot_df = plot_df[plot_df['Target'] == target_of_interest]
# Set seaborn style
sns.set(style="whitegrid")
# Create a separate plot for each country with actual vs predicted lines for each model
fig, axs = plt.subplots(2, 2, figsize=(16, 10), sharey=True)
axs = axs.flatten()
for i, country in enumerate(countries_to_plot):
ax = axs[i]
country_data = plot_df[plot_df['Country'] == country]
# Plot Actual values
actual_data = country_data[['Year', 'Actual']].drop_duplicates()
ax.plot(actual_data['Year'], actual_data['Actual'], label='Actual', color='black', marker='o')
# Plot Forecasts from each model
for model in country_data['Model'].unique():
model_data = country_data[country_data['Model'] == model]
ax.plot(model_data['Year'], model_data['Forecast'], label=f'Forecast ({model})', marker='x')
ax.set_title(f'{country} - Actual vs Predicted ({target_of_interest})')
ax.set_xlabel('Year')
ax.set_ylabel('Value')
ax.legend()
ax.grid(True)
plt.tight_layout()
plt.show()
Time Series Forecasting with Walk-Forward Validation using ARIMA, Prophet, and Random Forests (RMSE Evaluation)¶
Rolling Forecast Validation (Walk-Forward)¶
Rolling or walk-forward forecast validation is a technique used to evaluate the performance of time series forecasting models in a way that closely resembles real-world forecasting scenarios. Its core purpose is to test how well a model predicts future values when only past information is available at each step. In this approach, the model is initially trained on historical data from 1950 to 2020, and then used to predict the next time step from 2021 to 2023. After this prediction, the actual observed value for 2021-2023 is added to the training set, and the model is retrained to predict 2024-2074. This process is repeated step-by-step, moving forward through time.
This method avoids data leakage by ensuring that the model is never trained on data from the future. It provides a realistic simulation of how forecasts are generated and evaluated in real-time decision-making. Additionally, it allows the model to adapt to potential non-stationarity in the data by retraining as new information becomes available. Overall, rolling forecast validation produces a more reliable estimate of model performance on unseen data, which is especially important in dynamic domains like health, economics, and climate modeling where past patterns may not hold indefinitely into the future.
10 diversity Countries have been selected by their income level for rolling forecast validation (Walk-Forward) as follows:
- United States - High-income
- Germany - High-income
- Japan - High-income
- Brazil - Upper-middle-income
- India - Lower-middle-income
- Indonesia - Lower-middle-income
- Nigeria - Low-income
- Kenya - Low-income
- Mexico - Upper-middle-income
- Bangladesh - Low-middle-income
# Rolling Forecast - Walk Forward Validation
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# === Setup ===
selected_countries = [
'United States', 'Germany', 'Japan', 'Brazil', 'India',
'Indonesia', 'Nigeria', 'Kenya', 'Mexico', 'Bangladesh'
]
target_columns = ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']
selected_features_dict = {
'Life expectancy': [
'Child mortality rate' , 'GDP' , 'CPI_lag3' , 'Incomplete tertiary education_lag3' , 'Income_lag3' , 'Income',
'CPI' , 'Inflation', 'Inflation_lag1', 'Cost of a healthy diet', 'Cost of a healthy diet_lag3' , 'Unemployment Rate_lag2',
'Gini coefficient_lag3', 'Unemployment Rate_lag1'
],
'Cardiovascular diseases': [
'BMI_avg_lag3'
],
'Diabetes': [
'BMI_avg_lag3', 'CPI' , 'GDP' , 'Income','Income_lag1', 'Inflation_lag1', 'Inflation' , 'Cost of a healthy diet' , 'Inflation_lag2' ,
'Inflation_lag3'
]
}
start_train = 1950
end_train = 2020
real_eval_period = [2021, 2022, 2023]
# Forecast horizon starts from 2024 and goes till 2074
forecast_horizon = list(range(2024, 2074))
# === Create future rows for years 2024 to 2073
future_rows = []
for country in df_combined_with_country['Country'].unique():
for year in forecast_horizon:
future_rows.append({'Country': country, 'Year': year})
df_future = pd.DataFrame(future_rows)
df_forecast_ready = pd.concat([df_combined_with_country, df_future], ignore_index=True)
df_forecast_ready['Year'] = df_forecast_ready['Year'].astype(int)
# === Impute missing values across all countries and years
df_forecast_ready = (
df_forecast_ready
.sort_values(['Country', 'Year'])
.groupby('Country', group_keys=False)
.apply(lambda x: x.ffill().bfill().infer_objects(copy=False))
.reset_index(drop=True)
)
# === Initialize summary table
predictions_summary = []
# === Forecast Loop ===
for country in selected_countries:
df_country = df_forecast_ready[df_forecast_ready['Country'] == country].sort_values('Year')
for target in target_columns:
print(f"\n {country} — {target}")
if target not in df_country.columns:
print(" Target missing")
continue
features = selected_features_dict.get(target, [])
available_features = [f for f in features if f in df_country.columns]
if not available_features:
print(" No usable features found")
continue
df_train = df_country[df_country['Year'].between(start_train, end_train)]
df_eval_real = df_country[df_country['Year'].isin(real_eval_period)]
# === ARIMA ===
arima_rmse = None
try:
df_train_arima = df_train[[target]].copy()
df_train_arima.index = pd.date_range(start=f'{start_train}', periods=len(df_train_arima), freq='YE')
model_arima = ARIMA(df_train_arima, order=(1, 1, 1)).fit()
# Real evaluation
pred_real = model_arima.predict(start=len(df_train_arima), end=len(df_train_arima)+len(df_eval_real)-1)
actual_real = df_eval_real[target].values
arima_rmse = np.sqrt(mean_squared_error(actual_real, pred_real))
# Forecast for 2024-2073
arima_forecast = model_arima.predict(start=len(df_train_arima), end=len(df_train_arima) + len(forecast_horizon) - 1)
print(f"📉 ARIMA RMSE: {arima_rmse:.2f}")
except Exception as e:
print(f" ARIMA error: {e}")
# === Prophet ===
prophet_rmse = None
try:
prophet_df = df_train[['Year', target]].rename(columns={'Year': 'ds', target: 'y'})
prophet_df['ds'] = pd.to_datetime(prophet_df['ds'], format='%Y')
model_prophet = Prophet()
model_prophet.fit(prophet_df)
future_years = real_eval_period + forecast_horizon
future_dates = pd.DataFrame({'ds': pd.to_datetime(future_years, format='%Y')})
forecast_prophet = model_prophet.predict(future_dates)
# Real evaluation
pred_real = forecast_prophet[forecast_prophet['ds'].dt.year.isin(real_eval_period)]['yhat'].values
actual_real = df_eval_real[target].values
prophet_rmse = np.sqrt(mean_squared_error(actual_real, pred_real))
# Forecast for 2024-2073
prophet_forecast = forecast_prophet[forecast_prophet['ds'].dt.year.isin(forecast_horizon)]
print(f" Prophet RMSE: {prophet_rmse:.2f}")
except Exception as e:
print(f" Prophet error: {e}")
#### Random Forest ####
#from google.colab import data_table
#data_table.DataTable(df_forecast)
rf_rmse, rf_forecast = None, [None] * len(df_future)
try:
X = df_country[available_features]
y = df_country[target]
X_train = X[df_country['Year'].between(start_train, end_train)]
y_train = y[df_country['Year'].between(start_train, end_train)]
X_eval = X[df_country['Year'].isin(real_eval_period)]
y_eval = y[df_country['Year'].isin(real_eval_period)]
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
pred_eval = model.predict(X_eval)
rf_rmse = np.sqrt(mean_squared_error(y_eval, pred_eval))
X_forecast = X[df_country['Year'].isin(forecast_horizon)]
if not X_forecast.isnull().any(axis=1).any():
rf_forecast = model.predict(X_forecast).tolist()
except:
pass
# === Append to summary ===
predictions_summary.append({
"Country": country,
"Target": target,
"ARIMA_RMSE": round(arima_rmse, 4) if arima_rmse is not None else None,
"Prophet_RMSE": round(prophet_rmse, 4) if prophet_rmse is not None else None,
"RF_RMSE": round(rf_rmse, 4) if rf_rmse is not None else None
})
# === Final Summary Table ===
df_forecast_validation_summary = pd.DataFrame(predictions_summary)
df_forecast_validation_summary = df_forecast_validation_summary[[
"Country", "Target",
"ARIMA_RMSE", "Prophet_RMSE", "RF_RMSE"
]]
print("\n 📋 Rolling Forecast Validation Summary:")
print(df_forecast_validation_summary)
# Export summary
df_forecast_validation_summary.to_csv("forecast_summary.csv", index=False)
# Download to your computer
from google.colab import files
files.download("forecast_summary.csv")
/tmp/ipython-input-37-3481924030.py:57: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
.apply(lambda x: x.ffill().bfill().infer_objects(copy=False))
/tmp/ipython-input-37-3481924030.py:57: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
.apply(lambda x: x.ffill().bfill().infer_objects(copy=False))
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
United States — Life expectancy 📉 ARIMA RMSE: 2.00
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/hv2ktus1.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/oe_r3aoy.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=47361', 'data', 'file=/tmp/tmprjkocm4m/hv2ktus1.json', 'init=/tmp/tmprjkocm4m/oe_r3aoy.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model3c9z48hr/prophet_model-20250723141100.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:00 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:11:02 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 1.56 United States — Cardiovascular diseases
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/sa5_nkw_.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/pn5w6eso.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=25332', 'data', 'file=/tmp/tmprjkocm4m/sa5_nkw_.json', 'init=/tmp/tmprjkocm4m/pn5w6eso.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelyycjtjmt/prophet_model-20250723141104.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:04 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 1.19
14:11:04 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 11.97 United States — Diabetes
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/sghhl48c.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ahdz28xn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=97841', 'data', 'file=/tmp/tmprjkocm4m/sghhl48c.json', 'init=/tmp/tmprjkocm4m/ahdz28xn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modely7ekj70r/prophet_model-20250723141105.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:05 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.01
14:11:05 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.49 Germany — Life expectancy
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/k2qniys2.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/atdkbrr0.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=21012', 'data', 'file=/tmp/tmprjkocm4m/k2qniys2.json', 'init=/tmp/tmprjkocm4m/atdkbrr0.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelnw233o6r/prophet_model-20250723141106.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:06 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.47
14:11:07 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.61 Germany — Cardiovascular diseases
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ijt_1k_k.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/xt8pr25r.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=48144', 'data', 'file=/tmp/tmprjkocm4m/ijt_1k_k.json', 'init=/tmp/tmprjkocm4m/xt8pr25r.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelog7no6i2/prophet_model-20250723141108.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:08 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.43
14:11:09 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 2.13 Germany — Diabetes
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/j6behbrs.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/32yl0owc.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=97012', 'data', 'file=/tmp/tmprjkocm4m/j6behbrs.json', 'init=/tmp/tmprjkocm4m/32yl0owc.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelq_r770jn/prophet_model-20250723141109.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:09 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.00
14:11:09 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 2.76 Japan — Life expectancy
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/df3mky2k.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/i0y34wue.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=77900', 'data', 'file=/tmp/tmprjkocm4m/df3mky2k.json', 'init=/tmp/tmprjkocm4m/i0y34wue.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelzl5v553j/prophet_model-20250723141110.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:10 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.64
14:11:10 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.58 Japan — Cardiovascular diseases
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/v4jzd52u.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/eys9n03c.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=52127', 'data', 'file=/tmp/tmprjkocm4m/v4jzd52u.json', 'init=/tmp/tmprjkocm4m/eys9n03c.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modela5jsybat/prophet_model-20250723141111.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:11 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 1.55
14:11:11 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Prophet RMSE: 7.69 Japan — Diabetes 📉 ARIMA RMSE: 0.00
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/sqo53_sk.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/nw3ja3jo.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=30846', 'data', 'file=/tmp/tmprjkocm4m/sqo53_sk.json', 'init=/tmp/tmprjkocm4m/nw3ja3jo.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelao3gb3zi/prophet_model-20250723141111.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:11 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:11:12 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 1.84 Brazil — Life expectancy
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/_mkeoghc.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/5d68fu18.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=47694', 'data', 'file=/tmp/tmprjkocm4m/_mkeoghc.json', 'init=/tmp/tmprjkocm4m/5d68fu18.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelemn_5pxf/prophet_model-20250723141112.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:12 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 3.01
14:11:12 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 2.19 Brazil — Cardiovascular diseases
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/k4tsl6ub.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/brdxiy8q.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=12825', 'data', 'file=/tmp/tmprjkocm4m/k4tsl6ub.json', 'init=/tmp/tmprjkocm4m/brdxiy8q.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelb1p2eqd8/prophet_model-20250723141113.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:13 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 1.82
14:11:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 6.55 Brazil — Diabetes 📉 ARIMA RMSE: 0.00
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/2tv9vzvy.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/34ls6e25.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=19841', 'data', 'file=/tmp/tmprjkocm4m/2tv9vzvy.json', 'init=/tmp/tmprjkocm4m/34ls6e25.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelic3ef0zy/prophet_model-20250723141113.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:13 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
14:11:14 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.19 India — Life expectancy
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/w3ahvdnj.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/k_004rg9.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=7967', 'data', 'file=/tmp/tmprjkocm4m/w3ahvdnj.json', 'init=/tmp/tmprjkocm4m/k_004rg9.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelpwl_gp6r/prophet_model-20250723141116.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:16 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 1.97
14:11:16 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 2.48 India — Cardiovascular diseases
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/bdnme65z.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/cn7m_qtq.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=72093', 'data', 'file=/tmp/tmprjkocm4m/bdnme65z.json', 'init=/tmp/tmprjkocm4m/cn7m_qtq.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model7dhx381j/prophet_model-20250723141117.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:17 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 19.66
14:11:17 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 37.42 India — Diabetes
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ji15abxw.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/qaexe7ak.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=42659', 'data', 'file=/tmp/tmprjkocm4m/ji15abxw.json', 'init=/tmp/tmprjkocm4m/qaexe7ak.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model7_9nqat3/prophet_model-20250723141117.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:17 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.02
14:11:18 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.83 Indonesia — Life expectancy
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/or6ybfeu.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gjbudn2r.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=24173', 'data', 'file=/tmp/tmprjkocm4m/or6ybfeu.json', 'init=/tmp/tmprjkocm4m/gjbudn2r.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelb2l_7ydy/prophet_model-20250723141118.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:18 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 1.89
14:11:18 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 1.69 Indonesia — Cardiovascular diseases
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/pkwwmjuc.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/udz6n54u.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=61659', 'data', 'file=/tmp/tmprjkocm4m/pkwwmjuc.json', 'init=/tmp/tmprjkocm4m/udz6n54u.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model4k2hju0b/prophet_model-20250723141119.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:19 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 8.49
14:11:19 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Prophet RMSE: 8.00 Indonesia — Diabetes 📉 ARIMA RMSE: 0.00
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/rjaf6qb0.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/sa1qdzyl.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=43599', 'data', 'file=/tmp/tmprjkocm4m/rjaf6qb0.json', 'init=/tmp/tmprjkocm4m/sa1qdzyl.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model33g9hdrq/prophet_model-20250723141120.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:20 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:11:20 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.71 Nigeria — Life expectancy
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/j5uixll_.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/c2ncchjl.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=63492', 'data', 'file=/tmp/tmprjkocm4m/j5uixll_.json', 'init=/tmp/tmprjkocm4m/c2ncchjl.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelbo2jk3p2/prophet_model-20250723141120.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:20 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.70
14:11:21 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.37 Nigeria — Cardiovascular diseases
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6signe76.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/bbuk9v6d.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=50847', 'data', 'file=/tmp/tmprjkocm4m/6signe76.json', 'init=/tmp/tmprjkocm4m/bbuk9v6d.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model2yahfc88/prophet_model-20250723141121.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:21 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.72
14:11:21 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 4.50 Nigeria — Diabetes 📉 ARIMA RMSE: 0.00
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.11/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/sgchcqf2.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0dqqsdjd.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=22456', 'data', 'file=/tmp/tmprjkocm4m/sgchcqf2.json', 'init=/tmp/tmprjkocm4m/0dqqsdjd.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model61v4p2b5/prophet_model-20250723141122.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
14:11:22 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
14:11:22 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.14 Kenya — Life expectancy
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/c8_qu5z9.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/o2y5cnz7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=70118', 'data', 'file=/tmp/tmprjkocm4m/c8_qu5z9.json', 'init=/tmp/tmprjkocm4m/o2y5cnz7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelc374tma2/prophet_model-20250723141122.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:22 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 3.24
14:11:23 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 1.67 Kenya — Cardiovascular diseases
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/m5eq4lm3.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/cwqwo01b.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=3353', 'data', 'file=/tmp/tmprjkocm4m/m5eq4lm3.json', 'init=/tmp/tmprjkocm4m/cwqwo01b.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modely2k5w4ma/prophet_model-20250723141123.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:23 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.12
14:11:23 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.93 Kenya — Diabetes 📉 ARIMA RMSE: 0.00
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/xc74yh1y.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/rsc5b19s.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=14523', 'data', 'file=/tmp/tmprjkocm4m/xc74yh1y.json', 'init=/tmp/tmprjkocm4m/rsc5b19s.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelhpvvgbar/prophet_model-20250723141124.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:24 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:11:24 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 3.48 Mexico — Life expectancy
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/otq4fust.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gvom3sq3.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=26553', 'data', 'file=/tmp/tmprjkocm4m/otq4fust.json', 'init=/tmp/tmprjkocm4m/gvom3sq3.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeli2dck5t_/prophet_model-20250723141125.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:25 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 6.22
14:11:25 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 2.43 Mexico — Cardiovascular diseases
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/yngac99c.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/wkrpqp3_.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=17534', 'data', 'file=/tmp/tmprjkocm4m/yngac99c.json', 'init=/tmp/tmprjkocm4m/wkrpqp3_.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelqk6ke56c/prophet_model-20250723141126.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:26 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 0.58
14:11:26 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.84 Mexico — Diabetes 📉 ARIMA RMSE: 0.00
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/t5yrudgj.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/om9xdt73.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=7342', 'data', 'file=/tmp/tmprjkocm4m/t5yrudgj.json', 'init=/tmp/tmprjkocm4m/om9xdt73.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelggoo_1se/prophet_model-20250723141126.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:26 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:11:26 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 0.80
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/o7aet220.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/zau5k5sq.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=92840', 'data', 'file=/tmp/tmprjkocm4m/o7aet220.json', 'init=/tmp/tmprjkocm4m/zau5k5sq.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeld16cfamt/prophet_model-20250723141127.csv', 'method=optimize', 'algorithm=newton', 'iter=10000']
Bangladesh — Life expectancy 📉 ARIMA RMSE: 2.31
14:11:27 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:11:27 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 1.68 Bangladesh — Cardiovascular diseases
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/zocgl0ld.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jjkdu3gf.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=30638', 'data', 'file=/tmp/tmprjkocm4m/zocgl0ld.json', 'init=/tmp/tmprjkocm4m/jjkdu3gf.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelar84sbsb/prophet_model-20250723141128.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:28 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing
📉 ARIMA RMSE: 1.18
14:11:28 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 6.99
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jop9n5f7.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/rniq3m60.json DEBUG:cmdstanpy:idx 0
Bangladesh — Diabetes 📉 ARIMA RMSE: 0.00
DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=17359', 'data', 'file=/tmp/tmprjkocm4m/jop9n5f7.json', 'init=/tmp/tmprjkocm4m/rniq3m60.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model5zbgtsvv/prophet_model-20250723141129.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:11:29 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:11:30 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
Prophet RMSE: 2.99
📋 Rolling Forecast Validation Summary:
Country Target ARIMA_RMSE Prophet_RMSE RF_RMSE
0 United States Life expectancy 1.9969 1.5614 1.2177
1 United States Cardiovascular diseases 1.1904 11.9749 10.0919
2 United States Diabetes 0.0080 0.4896 0.0040
3 Germany Life expectancy 0.4746 0.6124 0.3367
4 Germany Cardiovascular diseases 0.4339 2.1255 0.9503
5 Germany Diabetes 0.0000 2.7582 0.0000
6 Japan Life expectancy 0.6387 0.5765 0.3200
7 Japan Cardiovascular diseases 1.5477 7.6884 4.2376
8 Japan Diabetes 0.0000 1.8411 0.0162
9 Brazil Life expectancy 3.0096 2.1896 1.2862
10 Brazil Cardiovascular diseases 1.8195 6.5472 3.5130
11 Brazil Diabetes 0.0000 0.1860 0.0457
12 India Life expectancy 1.9737 2.4758 2.1906
13 India Cardiovascular diseases 19.6630 37.4210 47.5512
14 India Diabetes 0.0197 0.8306 0.0017
15 Indonesia Life expectancy 1.8872 1.6929 1.6442
16 Indonesia Cardiovascular diseases 8.4866 7.9981 0.0971
17 Indonesia Diabetes 0.0000 0.7121 0.0035
18 Nigeria Life expectancy 0.7003 0.3693 1.2444
19 Nigeria Cardiovascular diseases 0.7164 4.4984 3.6177
20 Nigeria Diabetes 0.0000 0.1408 0.0027
21 Kenya Life expectancy 3.2353 1.6706 1.2934
22 Kenya Cardiovascular diseases 0.1218 0.9335 0.7993
23 Kenya Diabetes 0.0004 3.4797 0.0052
24 Mexico Life expectancy 6.2245 2.4286 2.4902
25 Mexico Cardiovascular diseases 0.5788 0.8437 6.2764
26 Mexico Diabetes 0.0000 0.7997 0.4129
27 Bangladesh Life expectancy 2.3127 1.6767 2.2987
28 Bangladesh Cardiovascular diseases 1.1756 6.9912 4.9245
29 Bangladesh Diabetes 0.0000 2.9878 0.1017
Result of Rolling Forecast Validation Summary Table:¶
Life Expectancy Random Forest consistently performs best (lowest RMSE) in countries like the US (0.88), Germany (0.35), Japan (0.28), and Kenya (0.81).
Prophet also shows strong performance, especially in Nigeria (0.37), Japan (0.57), and Brazil (2.18), outperforming ARIMA in many cases.
ARIMA lags behind in several regions — e.g., Mexico (6.22), Kenya (3.23), and Brazil (3.00) — likely due to its assumption of linearity and stationarity.
Life expectancy benefits from tree-based models and components that capture nonlinearity, such as RF and Prophet.
Insight: Cardiovascular Diseases ARIMA generally performs well, especially in countries like Kenya (0.12), Mexico (0.58), Nigeria (0.71), and Germany (0.43).
Prophet struggles considerably in places like India (37.42), Indonesia (8.00), and Japan (7.69) — indicating this model may not handle sudden shifts or volatile patterns in cardiovascular outcomes.
RF offers competitive results, particularly in Bangladesh (0.41) and Germany (0.83).
Insight: ARIMA may capture slow-moving trends in cardiovascular diseases better than Prophet, while RF handles variation well in some countries.
Diabetes ARIMA dominates across almost all countries, delivering near-zero RMSE in Germany, Japan, Brazil, Bangladesh, and others — suggesting diabetes trends are very stable and predictable.
RF also performs well, though usually with slightly higher RMSE.
Prophet tends to underperform, with RMSE peaking in Bangladesh (2.99), Germany (2.75), and Kenya (3.48).
Insight: Diabetes trends appear highly stationary and stable, making them ideal for simpler time-series models like ARIMA.
Final Model Training & Forecasting | Evaluation metrics (RMSE, MAPE, R²)¶
Once the validation of the model's performance using walk-forward validation and selected the best-performing model(s), then proceed to train the final model on all available historical data (1950 - 2023). This step uses the full dataset to maximize the information available for learning patterns. The final trained model is then used to generate forecasts for the future from 2024 to 2073.
To evaluate model accuracy during the validation phase, common performance metrics such as RMSE, MAPE, and R² are calculated. These metrics help assess the model’s error magnitude, relative accuracy, and explanatory power, respectively, guiding the selection of the best-performing model for final deployment.
# Step 19 Final Model Training & Forecasting - ok
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet
import warnings
import logging
warnings.filterwarnings("ignore")
logging.getLogger('statsmodels').setLevel(logging.ERROR)
# === Time Ranges
start_train = 1950
end_train = 2020
eval_years = [2021, 2022, 2023]
forecast_horizon = list(range(2024, 2075))
# === Input Variables
selected_countries = [
'United States', 'Germany', 'Japan', 'Brazil', 'India',
'Indonesia', 'Nigeria', 'Kenya', 'Mexico', 'Bangladesh'
]
target_columns = ['Life expectancy', 'Cardiovascular diseases', 'Diabetes']
selected_features_dict = {
'Life expectancy': [
'Child mortality rate' , 'GDP' , 'CPI_lag3' , 'Incomplete tertiary education_lag3' , 'Income_lag3' , 'Income',
'CPI' , 'Inflation', 'Inflation_lag1', 'Cost of a healthy diet', 'Cost of a healthy diet_lag3' , 'Unemployment Rate_lag2',
'Gini coefficient_lag3', 'Unemployment Rate_lag1'
],
'Cardiovascular diseases': [
'BMI_avg_lag3'
],
'Diabetes': [
'BMI_avg_lag3', 'CPI' , 'GDP' , 'Income','Income_lag1', 'Inflation_lag1', 'Inflation' , 'Cost of a healthy diet' , 'Inflation_lag2' ,
'Inflation_lag3'
]
}
# === Ready Dataset (already loaded)
# df_forecast_ready = your real dataset
# === Forecasting and Evaluation
forecast_summary = []
for country in selected_countries:
df_country = df_forecast_ready[df_forecast_ready['Country'] == country].sort_values('Year')
for target in target_columns:
if target not in df_country.columns:
continue
features = selected_features_dict.get(target, [])
available_features = [f for f in features if f in df_country.columns]
if not available_features:
continue
df_train = df_country[df_country['Year'].between(start_train, end_train)]
df_eval = df_country[df_country['Year'].isin(eval_years)]
df_forecast = df_country[df_country['Year'].isin(forecast_horizon)]
actual_eval = df_eval[target].values
#### ARIMA ####
arima_rmse, arima_forecast = None, [None] * len(df_forecast)
try:
train_series = df_train[[target]].copy()
train_series.index = pd.date_range(start='1950', periods=len(train_series), freq='YE')
model = ARIMA(train_series, order=(1, 1, 1)).fit()
pred_eval = model.predict(start=len(train_series), end=len(train_series) + len(df_eval) - 1)
arima_rmse = np.sqrt(mean_squared_error(actual_eval, pred_eval))
arima_forecast = model.predict(start=len(train_series) + len(df_eval),
end=len(train_series) + len(df_eval) + len(df_forecast) - 1).tolist()
except:
pass
#### Prophet ####
prophet_rmse, prophet_forecast = None, [None] * len(df_forecast)
try:
prophet_df = df_train[['Year', target]].rename(columns={'Year': 'ds', target: 'y'})
prophet_df['ds'] = pd.to_datetime(prophet_df['ds'], format='%Y')
model = Prophet()
model.fit(prophet_df)
eval_dates = pd.DataFrame({'ds': pd.to_datetime(eval_years, format='%Y')})
forecast_eval = model.predict(eval_dates)
prophet_rmse = np.sqrt(mean_squared_error(actual_eval, forecast_eval['yhat'].values))
forecast_years = pd.DataFrame({'ds': pd.to_datetime(df_forecast['Year'], format='%Y')})
prophet_forecast = model.predict(forecast_years)['yhat'].tolist()
except:
pass
#### Random Forest ####
rf_rmse, rf_forecast = None, [None] * len(df_forecast)
try:
X = df_country[available_features]
y = df_country[target]
X_train = X[df_country['Year'].between(start_train, end_train)]
y_train = y[df_country['Year'].between(start_train, end_train)]
X_eval = X[df_country['Year'].isin(eval_years)]
y_eval = y[df_country['Year'].isin(eval_years)]
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
pred_eval = model.predict(X_eval)
rf_rmse = np.sqrt(mean_squared_error(y_eval, pred_eval))
X_forecast = X[df_country['Year'].isin(forecast_horizon)]
if not X_forecast.isnull().any(axis=1).any():
rf_forecast = model.predict(X_forecast).tolist()
except:
pass
for i, year in enumerate(df_forecast['Year']):
forecast_summary.append({
"Country": country,
"Target": target,
"Year": year,
"ARIMA_RMSE": arima_rmse,
"ARIMA_Forecast": arima_forecast[i],
"Prophet_RMSE": prophet_rmse,
"Prophet_Forecast": prophet_forecast[i],
"RF_RMSE": rf_rmse,
"RF_Forecast": rf_forecast[i]
})
# === Combine All Results
df_model_comparison = pd.DataFrame(forecast_summary)
# === Summary Table: Best Model by RMSE
summary_table = df_model_comparison.groupby(['Country', 'Target'])[['ARIMA_RMSE', 'Prophet_RMSE', 'RF_RMSE']].first().reset_index()
def best_model_picker(row):
scores = {
'ARIMA': row['ARIMA_RMSE'],
'Prophet': row['Prophet_RMSE'],
'RF': row['RF_RMSE']
}
return min(scores, key=lambda k: scores[k] if pd.notnull(scores[k]) else np.inf)
summary_table['🎯 Best_Model'] = summary_table.apply(best_model_picker, axis=1)
# === Display Results
print("\n📊 Summary of Best Models per Country and Target:\n")
print(summary_table[['Country', 'Target', 'ARIMA_RMSE', 'Prophet_RMSE', 'RF_RMSE', '🎯 Best_Model']].to_string(index=False))
# === Optional Preview of Forecasts
sample_years = [2025, 2030, 2040, 2050, 2060, 2074]
df_sample = df_model_comparison[df_model_comparison['Year'].isin(sample_years)]
df_sample = df_sample.sort_values(['Country', 'Target', 'Year'])
print("\n📋 Forecasts for Selected Years:\n")
print(df_sample.head(30).to_string(index=False))
# Export summary
df_sample.to_csv("df_sample.csv", index=False)
# Download to your computer
from google.colab import files
files.download("df_sample.csv")
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jhcwchc0.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vweoogyl.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=24914', 'data', 'file=/tmp/tmprjkocm4m/jhcwchc0.json', 'init=/tmp/tmprjkocm4m/vweoogyl.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelj0wnzvhz/prophet_model-20250723141643.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:43 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:44 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/rre7spu8.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0mrqlqfe.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=31573', 'data', 'file=/tmp/tmprjkocm4m/rre7spu8.json', 'init=/tmp/tmprjkocm4m/0mrqlqfe.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelp876x4fq/prophet_model-20250723141644.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:44 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:45 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/hoa_1hsa.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/2njsc_od.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=7159', 'data', 'file=/tmp/tmprjkocm4m/hoa_1hsa.json', 'init=/tmp/tmprjkocm4m/2njsc_od.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model3nlagzka/prophet_model-20250723141646.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:46 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:46 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/l8ke2jrx.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/lhjjw6g7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=80603', 'data', 'file=/tmp/tmprjkocm4m/l8ke2jrx.json', 'init=/tmp/tmprjkocm4m/lhjjw6g7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model38jdxkhn/prophet_model-20250723141647.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:47 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:48 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gdxcl7wl.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3xke9evx.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=87665', 'data', 'file=/tmp/tmprjkocm4m/gdxcl7wl.json', 'init=/tmp/tmprjkocm4m/3xke9evx.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelpu7wv7cl/prophet_model-20250723141649.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:49 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:51 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6oeumoy8.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/htc62axo.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=95466', 'data', 'file=/tmp/tmprjkocm4m/6oeumoy8.json', 'init=/tmp/tmprjkocm4m/htc62axo.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model81f1yt28/prophet_model-20250723141652.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:52 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:53 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/h09yeslt.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4qv94o17.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=5296', 'data', 'file=/tmp/tmprjkocm4m/h09yeslt.json', 'init=/tmp/tmprjkocm4m/4qv94o17.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeltfej5lav/prophet_model-20250723141654.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:54 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:54 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/iafuzlyb.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6ncgu1me.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=594', 'data', 'file=/tmp/tmprjkocm4m/iafuzlyb.json', 'init=/tmp/tmprjkocm4m/6ncgu1me.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model5bun8m44/prophet_model-20250723141655.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:55 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:55 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/srnfxhb8.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/or_7mte5.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=93099', 'data', 'file=/tmp/tmprjkocm4m/srnfxhb8.json', 'init=/tmp/tmprjkocm4m/or_7mte5.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model0njs55rf/prophet_model-20250723141656.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:56 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:57 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/7x29i9xw.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ab_73mx1.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=32920', 'data', 'file=/tmp/tmprjkocm4m/7x29i9xw.json', 'init=/tmp/tmprjkocm4m/ab_73mx1.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeln5iq_o5t/prophet_model-20250723141657.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:57 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:16:58 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/a1thbkwi.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/1i9_4q6l.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=71652', 'data', 'file=/tmp/tmprjkocm4m/a1thbkwi.json', 'init=/tmp/tmprjkocm4m/1i9_4q6l.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model9rqekc0a/prophet_model-20250723141659.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:16:59 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:00 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/q25m6dmr.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/x43il6jd.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=13303', 'data', 'file=/tmp/tmprjkocm4m/q25m6dmr.json', 'init=/tmp/tmprjkocm4m/x43il6jd.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modellp41_i2v/prophet_model-20250723141700.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:00 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:01 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/7gvsqyum.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6_wbkee5.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=66181', 'data', 'file=/tmp/tmprjkocm4m/7gvsqyum.json', 'init=/tmp/tmprjkocm4m/6_wbkee5.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelu6qh116d/prophet_model-20250723141702.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:02 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:02 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/xhvfnn5y.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/lmtawef6.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=23981', 'data', 'file=/tmp/tmprjkocm4m/xhvfnn5y.json', 'init=/tmp/tmprjkocm4m/lmtawef6.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelujr2fgkc/prophet_model-20250723141704.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:04 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:04 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/t6r67krm.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/1mfv8zln.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=7313', 'data', 'file=/tmp/tmprjkocm4m/t6r67krm.json', 'init=/tmp/tmprjkocm4m/1mfv8zln.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelwv5so_lt/prophet_model-20250723141705.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:05 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:06 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9kj3lavx.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ct0bz22w.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=68870', 'data', 'file=/tmp/tmprjkocm4m/9kj3lavx.json', 'init=/tmp/tmprjkocm4m/ct0bz22w.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelnk_bh61a/prophet_model-20250723141706.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:06 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:06 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vxcvvjrc.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/uflvq_h7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=97440', 'data', 'file=/tmp/tmprjkocm4m/vxcvvjrc.json', 'init=/tmp/tmprjkocm4m/uflvq_h7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model2dktv7ri/prophet_model-20250723141707.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:07 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:07 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/wtggddnh.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3ug3tgbu.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=33870', 'data', 'file=/tmp/tmprjkocm4m/wtggddnh.json', 'init=/tmp/tmprjkocm4m/3ug3tgbu.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeln00z30ma/prophet_model-20250723141708.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:08 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:08 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/yal8p5kk.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/d9pb_k0z.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=74442', 'data', 'file=/tmp/tmprjkocm4m/yal8p5kk.json', 'init=/tmp/tmprjkocm4m/d9pb_k0z.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeld_04fv06/prophet_model-20250723141708.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:08 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:09 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/at3ncfq7.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/bw701o_1.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=87312', 'data', 'file=/tmp/tmprjkocm4m/at3ncfq7.json', 'init=/tmp/tmprjkocm4m/bw701o_1.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelvrp0ayey/prophet_model-20250723141709.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:09 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:09 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/bk99waij.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9ugdadji.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=77144', 'data', 'file=/tmp/tmprjkocm4m/bk99waij.json', 'init=/tmp/tmprjkocm4m/9ugdadji.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model6o9rlz6u/prophet_model-20250723141710.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:10 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:10 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/o8r1wigd.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/28o8aerw.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=18201', 'data', 'file=/tmp/tmprjkocm4m/o8r1wigd.json', 'init=/tmp/tmprjkocm4m/28o8aerw.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model5gbxmkf0/prophet_model-20250723141711.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:11 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:11 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jqzok81d.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/xql54bjo.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=71865', 'data', 'file=/tmp/tmprjkocm4m/jqzok81d.json', 'init=/tmp/tmprjkocm4m/xql54bjo.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelx0qhsg4m/prophet_model-20250723141711.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:11 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:12 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/t8ktr0zd.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tdi6fb39.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=49385', 'data', 'file=/tmp/tmprjkocm4m/t8ktr0zd.json', 'init=/tmp/tmprjkocm4m/tdi6fb39.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelz0g653ov/prophet_model-20250723141712.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:12 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/zrg5xabp.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/_gjgf0g3.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=1044', 'data', 'file=/tmp/tmprjkocm4m/zrg5xabp.json', 'init=/tmp/tmprjkocm4m/_gjgf0g3.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelsgt2zo9_/prophet_model-20250723141713.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:13 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/xakcg18v.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0ybp7s7o.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=55205', 'data', 'file=/tmp/tmprjkocm4m/xakcg18v.json', 'init=/tmp/tmprjkocm4m/0ybp7s7o.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelvo4r4_mp/prophet_model-20250723141714.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:14 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:14 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/69ylthlq.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/yld0oep9.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=74260', 'data', 'file=/tmp/tmprjkocm4m/69ylthlq.json', 'init=/tmp/tmprjkocm4m/yld0oep9.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelp73lir1i/prophet_model-20250723141714.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:14 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:15 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/awp7vhhy.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/cll5xh9y.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=41605', 'data', 'file=/tmp/tmprjkocm4m/awp7vhhy.json', 'init=/tmp/tmprjkocm4m/cll5xh9y.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelm8jm_mj_/prophet_model-20250723141715.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:15 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:16 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/rfipcmzh.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/_qdzg3hl.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=62515', 'data', 'file=/tmp/tmprjkocm4m/rfipcmzh.json', 'init=/tmp/tmprjkocm4m/_qdzg3hl.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model63lxhar3/prophet_model-20250723141717.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:17 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:17 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/2wi25chy.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3t2y78lk.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=34308', 'data', 'file=/tmp/tmprjkocm4m/2wi25chy.json', 'init=/tmp/tmprjkocm4m/3t2y78lk.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeleaesl2v1/prophet_model-20250723141718.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:17:18 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:17:18 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
📊 Summary of Best Models per Country and Target:
Country Target ARIMA_RMSE Prophet_RMSE RF_RMSE 🎯 Best_Model
Bangladesh Cardiovascular diseases 1.175582 6.991238 4.924493 ARIMA
Bangladesh Diabetes 0.000036 2.987844 0.101733 ARIMA
Bangladesh Life expectancy 2.312728 1.676697 2.298684 Prophet
Brazil Cardiovascular diseases 1.819507 6.547227 3.512954 ARIMA
Brazil Diabetes 0.000000 0.186005 0.045713 ARIMA
Brazil Life expectancy 3.009573 2.189554 1.286215 RF
Germany Cardiovascular diseases 0.433925 2.125500 0.950348 ARIMA
Germany Diabetes 0.000000 2.758175 0.000000 ARIMA
Germany Life expectancy 0.474573 0.612408 0.336656 RF
India Cardiovascular diseases 19.662985 37.420988 47.551155 ARIMA
India Diabetes 0.019744 0.830592 0.001732 RF
India Life expectancy 1.973657 2.475751 2.190597 ARIMA
Indonesia Cardiovascular diseases 8.486563 7.998086 0.097082 RF
Indonesia Diabetes 0.000000 0.712114 0.003464 ARIMA
Indonesia Life expectancy 1.887179 1.692886 1.644150 RF
Japan Cardiovascular diseases 1.547668 7.688441 4.237571 ARIMA
Japan Diabetes 0.000000 1.841061 0.016166 ARIMA
Japan Life expectancy 0.638746 0.576474 0.319972 RF
Kenya Cardiovascular diseases 0.121752 0.933468 0.799274 ARIMA
Kenya Diabetes 0.000379 3.479734 0.005196 ARIMA
Kenya Life expectancy 3.235337 1.670562 1.293366 RF
Mexico Cardiovascular diseases 0.578806 0.843693 6.276441 ARIMA
Mexico Diabetes 0.000000 0.799705 0.412910 ARIMA
Mexico Life expectancy 6.224500 2.428620 2.490150 Prophet
Nigeria Cardiovascular diseases 0.716350 4.498448 3.617701 ARIMA
Nigeria Diabetes 0.000000 0.140798 0.002708 ARIMA
Nigeria Life expectancy 0.700330 0.369290 1.244393 Prophet
United States Cardiovascular diseases 1.190369 11.974926 10.091925 ARIMA
United States Diabetes 0.007983 0.489566 0.004000 RF
United States Life expectancy 1.996910 1.561422 1.217660 RF
📋 Forecasts for Selected Years:
Country Target Year ARIMA_RMSE ARIMA_Forecast Prophet_RMSE Prophet_Forecast RF_RMSE RF_Forecast
Bangladesh Cardiovascular diseases 2025 1.175582 30.440474 6.991238 22.343115 4.924493 23.340201
Bangladesh Cardiovascular diseases 2030 1.175582 31.940794 6.991238 24.463361 4.924493 23.340201
Bangladesh Cardiovascular diseases 2040 1.175582 34.260180 6.991238 28.785528 4.924493 23.340201
Bangladesh Cardiovascular diseases 2050 1.175582 35.897214 6.991238 32.425885 4.924493 23.340201
Bangladesh Cardiovascular diseases 2060 1.175582 37.052640 6.991238 36.748052 4.924493 23.340201
Bangladesh Diabetes 2025 0.000036 9.800033 2.987844 6.593908 0.101733 9.643000
Bangladesh Diabetes 2030 0.000036 9.800026 2.987844 6.364059 0.101733 9.643000
Bangladesh Diabetes 2040 0.000036 9.800027 2.987844 5.788692 0.101733 9.643000
Bangladesh Diabetes 2050 0.000036 9.800027 2.987844 5.211562 0.101733 9.643000
Bangladesh Diabetes 2060 0.000036 9.800027 2.987844 4.636195 0.101733 9.643000
Bangladesh Life expectancy 2025 2.312728 71.671340 1.676697 76.010260 2.298684 71.741197
Bangladesh Life expectancy 2030 2.312728 71.671358 1.676697 77.911348 2.298684 71.741197
Bangladesh Life expectancy 2040 2.312728 71.671358 1.676697 83.766696 2.298684 71.741197
Bangladesh Life expectancy 2050 2.312728 71.671358 1.676697 88.736171 2.298684 71.741197
Bangladesh Life expectancy 2060 2.312728 71.671358 1.676697 94.591519 2.298684 71.741197
Brazil Cardiovascular diseases 2025 1.819507 37.512576 6.547227 34.048560 3.512954 35.433321
Brazil Cardiovascular diseases 2030 1.819507 38.026433 6.547227 37.240321 3.512954 35.433321
Brazil Cardiovascular diseases 2040 1.819507 38.724635 6.547227 43.801768 3.512954 35.433321
Brazil Cardiovascular diseases 2050 1.819507 39.136841 6.547227 49.309747 3.512954 35.433321
Brazil Cardiovascular diseases 2060 1.819507 39.380199 6.547227 55.871194 3.512954 35.433321
Brazil Diabetes 2025 0.000000 8.300000 0.186005 8.233562 0.045713 8.348000
Brazil Diabetes 2030 0.000000 8.300000 0.186005 8.506609 0.045713 8.348000
Brazil Diabetes 2040 0.000000 8.300000 0.186005 8.908965 0.045713 8.348000
Brazil Diabetes 2050 0.000000 8.300000 0.186005 9.428119 0.045713 8.348000
Brazil Diabetes 2060 0.000000 8.300000 0.186005 9.830475 0.045713 8.348000
Brazil Life expectancy 2025 3.009573 69.422619 2.189554 77.316414 1.286215 73.905454
Brazil Life expectancy 2030 3.009573 65.412065 2.189554 78.606403 1.286215 73.905454
Brazil Life expectancy 2040 3.009573 59.751236 2.189554 81.065716 1.286215 73.905454
Brazil Life expectancy 2050 3.009573 56.227383 2.189554 83.708557 1.286215 73.905454
Brazil Life expectancy 2060 3.009573 54.033793 2.189554 86.167871 1.286215 73.905454
Based on the analysis of the Rolling Forecast Validation Summary, the best forecasting model varies by target health outcome—life expectancy, cardiovascular diseases, and diabetes—depending on performance measured by RMSE (Root Mean Square Error). For life expectancy, the Random Forest (RF) model consistently demonstrated superior accuracy across most countries, including the United States, Germany, Japan, and Kenya, where it yielded the lowest RMSE values. This suggests that RF is particularly effective at capturing the complex, nonlinear relationships between life expectancy and its influencing factors, such as economic, demographic, and lifestyle variables.
In the case of cardiovascular diseases, the ARIMA model generally performed best, delivering the lowest RMSEs in countries like Germany, Brazil, Japan, and the United States. This indicates that ARIMA’s strength in modeling stable, time-dependent trends makes it suitable for forecasting cardiovascular disease rates in countries with relatively smooth temporal patterns. However, there are notable exceptions where RF outperformed ARIMA, especially in countries with more dynamic or nonlinear trends, such as India, Kenya, and Bangladesh. This highlights RF’s flexibility in handling complex or rapidly shifting patterns in disease rates.
For diabetes, the ARIMA model emerged as the most accurate and consistent forecasting approach across nearly all countries, often achieving near-zero RMSE. Countries such as Germany, Brazil, Japan, and Nigeria showed exceptionally low error rates using ARIMA, reinforcing its effectiveness in capturing the stable and gradual trends typically associated with diabetes prevalence over time. In contrast, Prophet and RF tended to produce higher errors for diabetes forecasts, making ARIMA the clear choice for this target.
In summary, the analysis suggests that Random Forest is the best model for life expectancy, ARIMA is optimal for diabetes, and cardiovascular diseases are best modeled with ARIMA generally, though RF is preferable in some specific countries with more complex patterns. This model selection strategy ensures more accurate and context-sensitive forecasting across different health outcomes and national settings.
Summary of Best Models per Country and Target¶
# Summary of Best Models per Country and Target
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet
import warnings
import logging
warnings.filterwarnings("ignore")
logging.getLogger('statsmodels').setLevel(logging.ERROR)
# === Time Ranges
start_train = 1950
end_train = 2020
eval_years = [2021, 2022, 2023]
forecast_horizon = list(range(2024, 2075))
# === Input Variables
selected_features_dict = {
'Life expectancy': [
'Child mortality rate' , 'GDP' , 'CPI_lag3' , 'Incomplete tertiary education_lag3' , 'Income_lag3' , 'Income',
'CPI' , 'Inflation', 'Inflation_lag1', 'Cost of a healthy diet', 'Cost of a healthy diet_lag3' , 'Unemployment Rate_lag2',
'Gini coefficient_lag3', 'Unemployment Rate_lag1'
],
'Cardiovascular diseases': [
'BMI_avg_lag3'
],
'Diabetes': [
'BMI_avg_lag3', 'CPI' , 'GDP' , 'Income','Income_lag1', 'Inflation_lag1', 'Inflation' , 'Cost of a healthy diet' , 'Inflation_lag2' ,
'Inflation_lag3'
]
}
# === Ready Dataset (already loaded)
# df_forecast_ready = your real dataset
# === Forecasting and Evaluation
forecast_summary = []
for country in selected_countries:
df_country = df_forecast_ready[df_forecast_ready['Country'] == country].sort_values('Year')
for target in target_columns:
if target not in df_country.columns:
continue
features = selected_features_dict.get(target, [])
available_features = [f for f in features if f in df_country.columns]
if not available_features:
continue
df_train = df_country[df_country['Year'].between(start_train, end_train)]
df_eval = df_country[df_country['Year'].isin(eval_years)]
df_forecast = df_country[df_country['Year'].isin(forecast_horizon)]
actual_eval = df_eval[target].values
#### ARIMA ####
arima_rmse, arima_forecast = None, [None] * len(df_forecast)
try:
train_series = df_train[[target]].copy()
train_series.index = pd.date_range(start='1950', periods=len(train_series), freq='YE')
model = ARIMA(train_series, order=(1, 1, 1)).fit()
pred_eval = model.predict(start=len(train_series), end=len(train_series) + len(df_eval) - 1)
arima_rmse = np.sqrt(mean_squared_error(actual_eval, pred_eval))
arima_forecast = model.predict(start=len(train_series) + len(df_eval),
end=len(train_series) + len(df_eval) + len(df_forecast) - 1).tolist()
except:
pass
#### Prophet ####
prophet_rmse, prophet_forecast = None, [None] * len(df_forecast)
try:
prophet_df = df_train[['Year', target]].rename(columns={'Year': 'ds', target: 'y'})
prophet_df['ds'] = pd.to_datetime(prophet_df['ds'], format='%Y')
model = Prophet()
model.fit(prophet_df)
eval_dates = pd.DataFrame({'ds': pd.to_datetime(eval_years, format='%Y')})
forecast_eval = model.predict(eval_dates)
prophet_rmse = np.sqrt(mean_squared_error(actual_eval, forecast_eval['yhat'].values))
forecast_years = pd.DataFrame({'ds': pd.to_datetime(df_forecast['Year'], format='%Y')})
prophet_forecast = model.predict(forecast_years)['yhat'].tolist()
except:
pass
#### Random Forest ####
rf_rmse, rf_forecast = None, [None] * len(df_forecast)
try:
X = df_country[available_features]
y = df_country[target]
X_train = X[df_country['Year'].between(start_train, end_train)]
y_train = y[df_country['Year'].between(start_train, end_train)]
X_eval = X[df_country['Year'].isin(eval_years)]
y_eval = y[df_country['Year'].isin(eval_years)]
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
pred_eval = model.predict(X_eval)
rf_rmse = np.sqrt(mean_squared_error(y_eval, pred_eval))
X_forecast = X[df_country['Year'].isin(forecast_horizon)]
if not X_forecast.isnull().any(axis=1).any():
rf_forecast = model.predict(X_forecast).tolist()
except:
pass
for i, year in enumerate(df_forecast['Year']):
forecast_summary.append({
"Country": country,
"Target": target,
"Year": year,
"ARIMA_RMSE": arima_rmse,
"ARIMA_Forecast": arima_forecast[i],
"Prophet_RMSE": prophet_rmse,
"Prophet_Forecast": prophet_forecast[i],
"RF_RMSE": rf_rmse,
"RF_Forecast": rf_forecast[i]
})
# === Combine All Results
df_model_comparison = pd.DataFrame(forecast_summary)
# === Summary Table: Best Model by RMSE
summary_table = df_model_comparison.groupby(['Country', 'Target'])[['ARIMA_RMSE', 'Prophet_RMSE', 'RF_RMSE']].first().reset_index()
def best_model_picker(row):
scores = {
'ARIMA': row['ARIMA_RMSE'],
'Prophet': row['Prophet_RMSE'],
'RF': row['RF_RMSE']
}
return min(scores, key=lambda k: scores[k] if pd.notnull(scores[k]) else np.inf)
summary_table['🎯 Best_Model'] = summary_table.apply(best_model_picker, axis=1)
# === Display Results
print("\n📊 Summary of Best Models per Country and Target:\n")
print(summary_table[['Country', 'Target', 'ARIMA_RMSE', 'Prophet_RMSE', 'RF_RMSE', '🎯 Best_Model']].to_string(index=False))
# === Optional Preview of Forecasts
sample_years = [2025, 2030, 2040, 2050, 2060, 2074]
df_sample = df_model_comparison[df_model_comparison['Year'].isin(sample_years)]
df_sample = df_sample.sort_values(['Country', 'Target', 'Year'])
print("\n📋 Forecasts for Selected Years:\n")
print(df_sample.head(30).to_string(index=False))
# Export summary
summary_table.to_csv("summary_table.csv", index=False)
# Download to your computer
from google.colab import files
files.download("summary_table.csv")
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0uaygta5.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/y_0bf22b.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=79884', 'data', 'file=/tmp/tmprjkocm4m/0uaygta5.json', 'init=/tmp/tmprjkocm4m/y_0bf22b.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modele4l2i_hb/prophet_model-20250723142704.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:04 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:05 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/wxtuj1yg.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jercausa.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=63095', 'data', 'file=/tmp/tmprjkocm4m/wxtuj1yg.json', 'init=/tmp/tmprjkocm4m/jercausa.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model8rpms0_i/prophet_model-20250723142705.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:05 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:06 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tyalwic6.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/hpdvqxbc.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=21626', 'data', 'file=/tmp/tmprjkocm4m/tyalwic6.json', 'init=/tmp/tmprjkocm4m/hpdvqxbc.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model3ajmvsiq/prophet_model-20250723142708.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:08 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:09 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/c4giovs_.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/o26sa9z_.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=31802', 'data', 'file=/tmp/tmprjkocm4m/c4giovs_.json', 'init=/tmp/tmprjkocm4m/o26sa9z_.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelifo61iqm/prophet_model-20250723142710.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:10 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:10 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/batx6lqv.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/u80aprol.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=87810', 'data', 'file=/tmp/tmprjkocm4m/batx6lqv.json', 'init=/tmp/tmprjkocm4m/u80aprol.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelex_ra9bb/prophet_model-20250723142711.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:11 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:12 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/geb2ga5h.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/bt4alwzn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=8324', 'data', 'file=/tmp/tmprjkocm4m/geb2ga5h.json', 'init=/tmp/tmprjkocm4m/bt4alwzn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelg00w53q7/prophet_model-20250723142712.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:12 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9tf6yo1l.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/fax3thm9.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=80492', 'data', 'file=/tmp/tmprjkocm4m/9tf6yo1l.json', 'init=/tmp/tmprjkocm4m/fax3thm9.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model189h6oxf/prophet_model-20250723142715.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:15 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:16 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/hzffesbf.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/bjuloas1.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=22647', 'data', 'file=/tmp/tmprjkocm4m/hzffesbf.json', 'init=/tmp/tmprjkocm4m/bjuloas1.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeln8pwb9bg/prophet_model-20250723142718.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:18 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:19 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/zc6sw2ry.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9hhkwcqg.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=13649', 'data', 'file=/tmp/tmprjkocm4m/zc6sw2ry.json', 'init=/tmp/tmprjkocm4m/9hhkwcqg.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model9lmy0wxx/prophet_model-20250723142719.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:19 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:20 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/8ji8tg9w.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/wen2r1fn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=57148', 'data', 'file=/tmp/tmprjkocm4m/8ji8tg9w.json', 'init=/tmp/tmprjkocm4m/wen2r1fn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelhxsjir7p/prophet_model-20250723142722.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:22 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:22 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/dbzp56ep.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/d5p02kbc.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=87335', 'data', 'file=/tmp/tmprjkocm4m/dbzp56ep.json', 'init=/tmp/tmprjkocm4m/d5p02kbc.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelprddtynf/prophet_model-20250723142723.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:23 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:23 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/1ole7me2.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9yw8_zbw.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=31899', 'data', 'file=/tmp/tmprjkocm4m/1ole7me2.json', 'init=/tmp/tmprjkocm4m/9yw8_zbw.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelt6rfkgyk/prophet_model-20250723142724.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:24 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:24 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6g0_e09q.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9q73uimo.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=29568', 'data', 'file=/tmp/tmprjkocm4m/6g0_e09q.json', 'init=/tmp/tmprjkocm4m/9q73uimo.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model62zbvolv/prophet_model-20250723142725.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:25 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:25 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/pll52ekx.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/o4uah_jb.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=38568', 'data', 'file=/tmp/tmprjkocm4m/pll52ekx.json', 'init=/tmp/tmprjkocm4m/o4uah_jb.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelu2bceios/prophet_model-20250723142725.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:25 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:26 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/rrw4jsng.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/s6r9b384.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=99013', 'data', 'file=/tmp/tmprjkocm4m/rrw4jsng.json', 'init=/tmp/tmprjkocm4m/s6r9b384.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelzggr85id/prophet_model-20250723142726.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:26 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:27 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3uj2mjnf.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3iufvc_v.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=55868', 'data', 'file=/tmp/tmprjkocm4m/3uj2mjnf.json', 'init=/tmp/tmprjkocm4m/3iufvc_v.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeljh75tgip/prophet_model-20250723142728.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:28 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:28 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/cc99mxtp.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/z38960xn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=56719', 'data', 'file=/tmp/tmprjkocm4m/cc99mxtp.json', 'init=/tmp/tmprjkocm4m/z38960xn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeldkf4uq7l/prophet_model-20250723142729.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:29 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:29 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/n4waejiy.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ddkz4fw7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=38710', 'data', 'file=/tmp/tmprjkocm4m/n4waejiy.json', 'init=/tmp/tmprjkocm4m/ddkz4fw7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelwik2h2k_/prophet_model-20250723142730.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:30 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:30 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/yick6axh.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tpmo8mfn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=60372', 'data', 'file=/tmp/tmprjkocm4m/yick6axh.json', 'init=/tmp/tmprjkocm4m/tpmo8mfn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelrbqjn69i/prophet_model-20250723142730.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:30 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:31 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/yy1ou4dl.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3ufp0eys.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=65453', 'data', 'file=/tmp/tmprjkocm4m/yy1ou4dl.json', 'init=/tmp/tmprjkocm4m/3ufp0eys.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeldjftl38l/prophet_model-20250723142731.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:31 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:32 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/alvi6xs0.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/l135u0am.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=18390', 'data', 'file=/tmp/tmprjkocm4m/alvi6xs0.json', 'init=/tmp/tmprjkocm4m/l135u0am.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model7j6py8hp/prophet_model-20250723142732.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:32 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:32 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6rvt9dt5.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/y7w05mpe.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=6064', 'data', 'file=/tmp/tmprjkocm4m/6rvt9dt5.json', 'init=/tmp/tmprjkocm4m/y7w05mpe.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelzx285i_x/prophet_model-20250723142733.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:33 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:33 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6ws1o14o.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/yj9xeaw5.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=57084', 'data', 'file=/tmp/tmprjkocm4m/6ws1o14o.json', 'init=/tmp/tmprjkocm4m/yj9xeaw5.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeltfwe1azm/prophet_model-20250723142733.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:34 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:34 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0_ha6_es.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/80y4z0qs.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=82127', 'data', 'file=/tmp/tmprjkocm4m/0_ha6_es.json', 'init=/tmp/tmprjkocm4m/80y4z0qs.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelhg28fgqr/prophet_model-20250723142734.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:34 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:35 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/q7bqzno3.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/arav09wr.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=69149', 'data', 'file=/tmp/tmprjkocm4m/q7bqzno3.json', 'init=/tmp/tmprjkocm4m/arav09wr.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model_0k_gwxi/prophet_model-20250723142735.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:35 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:36 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/a_wxfuz1.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/x1_dz2um.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=79623', 'data', 'file=/tmp/tmprjkocm4m/a_wxfuz1.json', 'init=/tmp/tmprjkocm4m/x1_dz2um.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model3zsz2fsm/prophet_model-20250723142736.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:36 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:36 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/2z3mu9uq.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ghs_5hi3.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=80892', 'data', 'file=/tmp/tmprjkocm4m/2z3mu9uq.json', 'init=/tmp/tmprjkocm4m/ghs_5hi3.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelzmttl_kw/prophet_model-20250723142737.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:37 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:37 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0czxr6n3.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/_n6wsd0y.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=6662', 'data', 'file=/tmp/tmprjkocm4m/0czxr6n3.json', 'init=/tmp/tmprjkocm4m/_n6wsd0y.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model25eiolvn/prophet_model-20250723142737.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:37 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:38 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/c9lghnac.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/h21hf3u2.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=75262', 'data', 'file=/tmp/tmprjkocm4m/c9lghnac.json', 'init=/tmp/tmprjkocm4m/h21hf3u2.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeleic9ik8d/prophet_model-20250723142738.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:38 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:38 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0xko49k2.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/wbtnovqr.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=1437', 'data', 'file=/tmp/tmprjkocm4m/0xko49k2.json', 'init=/tmp/tmprjkocm4m/wbtnovqr.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelv5a_av8n/prophet_model-20250723142739.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:39 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:39 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
📊 Summary of Best Models per Country and Target:
Country Target ARIMA_RMSE Prophet_RMSE RF_RMSE 🎯 Best_Model
Bangladesh Cardiovascular diseases 1.175582 6.991238 4.924493 ARIMA
Bangladesh Diabetes 0.000036 2.987844 0.101733 ARIMA
Bangladesh Life expectancy 2.312728 1.676697 2.298684 Prophet
Brazil Cardiovascular diseases 1.819507 6.547227 3.512954 ARIMA
Brazil Diabetes 0.000000 0.186005 0.045713 ARIMA
Brazil Life expectancy 3.009573 2.189554 1.286215 RF
Germany Cardiovascular diseases 0.433925 2.125500 0.950348 ARIMA
Germany Diabetes 0.000000 2.758175 0.000000 ARIMA
Germany Life expectancy 0.474573 0.612408 0.336656 RF
India Cardiovascular diseases 19.662985 37.420988 47.551155 ARIMA
India Diabetes 0.019744 0.830592 0.001732 RF
India Life expectancy 1.973657 2.475751 2.190597 ARIMA
Indonesia Cardiovascular diseases 8.486563 7.998086 0.097082 RF
Indonesia Diabetes 0.000000 0.712114 0.003464 ARIMA
Indonesia Life expectancy 1.887179 1.692886 1.644150 RF
Japan Cardiovascular diseases 1.547668 7.688441 4.237571 ARIMA
Japan Diabetes 0.000000 1.841061 0.016166 ARIMA
Japan Life expectancy 0.638746 0.576474 0.319972 RF
Kenya Cardiovascular diseases 0.121752 0.933468 0.799274 ARIMA
Kenya Diabetes 0.000379 3.479734 0.005196 ARIMA
Kenya Life expectancy 3.235337 1.670562 1.293366 RF
Mexico Cardiovascular diseases 0.578806 0.843693 6.276441 ARIMA
Mexico Diabetes 0.000000 0.799705 0.412910 ARIMA
Mexico Life expectancy 6.224500 2.428620 2.490150 Prophet
Nigeria Cardiovascular diseases 0.716350 4.498448 3.617701 ARIMA
Nigeria Diabetes 0.000000 0.140798 0.002708 ARIMA
Nigeria Life expectancy 0.700330 0.369290 1.244393 Prophet
United States Cardiovascular diseases 1.190369 11.974926 10.091925 ARIMA
United States Diabetes 0.007983 0.489566 0.004000 RF
United States Life expectancy 1.996910 1.561422 1.217660 RF
📋 Forecasts for Selected Years:
Country Target Year ARIMA_RMSE ARIMA_Forecast Prophet_RMSE Prophet_Forecast RF_RMSE RF_Forecast
Bangladesh Cardiovascular diseases 2025 1.175582 30.440474 6.991238 22.343115 4.924493 23.340201
Bangladesh Cardiovascular diseases 2030 1.175582 31.940794 6.991238 24.463361 4.924493 23.340201
Bangladesh Cardiovascular diseases 2040 1.175582 34.260180 6.991238 28.785528 4.924493 23.340201
Bangladesh Cardiovascular diseases 2050 1.175582 35.897214 6.991238 32.425885 4.924493 23.340201
Bangladesh Cardiovascular diseases 2060 1.175582 37.052640 6.991238 36.748052 4.924493 23.340201
Bangladesh Diabetes 2025 0.000036 9.800033 2.987844 6.593908 0.101733 9.643000
Bangladesh Diabetes 2030 0.000036 9.800026 2.987844 6.364059 0.101733 9.643000
Bangladesh Diabetes 2040 0.000036 9.800027 2.987844 5.788692 0.101733 9.643000
Bangladesh Diabetes 2050 0.000036 9.800027 2.987844 5.211562 0.101733 9.643000
Bangladesh Diabetes 2060 0.000036 9.800027 2.987844 4.636195 0.101733 9.643000
Bangladesh Life expectancy 2025 2.312728 71.671340 1.676697 76.010260 2.298684 71.741197
Bangladesh Life expectancy 2030 2.312728 71.671358 1.676697 77.911348 2.298684 71.741197
Bangladesh Life expectancy 2040 2.312728 71.671358 1.676697 83.766696 2.298684 71.741197
Bangladesh Life expectancy 2050 2.312728 71.671358 1.676697 88.736171 2.298684 71.741197
Bangladesh Life expectancy 2060 2.312728 71.671358 1.676697 94.591519 2.298684 71.741197
Brazil Cardiovascular diseases 2025 1.819507 37.512576 6.547227 34.048560 3.512954 35.433321
Brazil Cardiovascular diseases 2030 1.819507 38.026433 6.547227 37.240321 3.512954 35.433321
Brazil Cardiovascular diseases 2040 1.819507 38.724635 6.547227 43.801768 3.512954 35.433321
Brazil Cardiovascular diseases 2050 1.819507 39.136841 6.547227 49.309747 3.512954 35.433321
Brazil Cardiovascular diseases 2060 1.819507 39.380199 6.547227 55.871194 3.512954 35.433321
Brazil Diabetes 2025 0.000000 8.300000 0.186005 8.233562 0.045713 8.348000
Brazil Diabetes 2030 0.000000 8.300000 0.186005 8.506609 0.045713 8.348000
Brazil Diabetes 2040 0.000000 8.300000 0.186005 8.908965 0.045713 8.348000
Brazil Diabetes 2050 0.000000 8.300000 0.186005 9.428119 0.045713 8.348000
Brazil Diabetes 2060 0.000000 8.300000 0.186005 9.830475 0.045713 8.348000
Brazil Life expectancy 2025 3.009573 69.422619 2.189554 77.316414 1.286215 73.905454
Brazil Life expectancy 2030 3.009573 65.412065 2.189554 78.606403 1.286215 73.905454
Brazil Life expectancy 2040 3.009573 59.751236 2.189554 81.065716 1.286215 73.905454
Brazil Life expectancy 2050 3.009573 56.227383 2.189554 83.708557 1.286215 73.905454
Brazil Life expectancy 2060 3.009573 54.033793 2.189554 86.167871 1.286215 73.905454
Evaluation metrics (RMSE, MAPE, R²)¶
# Evaluation metrics (RMSE, MAPE, R²)
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
def calculate_metrics(actual, predicted):
rmse = np.sqrt(mean_squared_error(actual, predicted))
mae = mean_absolute_error(actual, predicted)
r2 = r2_score(actual, predicted)
mape = np.mean(np.abs((actual - predicted) / actual)) * 100
return round(rmse, 4), round(mape, 2), round(r2, 4)
metrics_summary = []
eval_results = [] # Add this above your for-country loop to initialize the collector
# Evaluation years
eval_years = [2021, 2022, 2023]
for country in selected_countries:
df_country = df_forecast_ready[df_forecast_ready['Country'] == country]
for target in target_columns:
if target not in df_country.columns:
continue
actual = df_country[df_country['Year'].isin(eval_years)][target].values
# --- ARIMA ---
try:
train_series = df_country[df_country['Year'].between(1950, 2020)][[target]]
train_series.index = pd.date_range(start='1950', periods=len(train_series), freq='YE')
model_arima = ARIMA(train_series, order=(1, 1, 1)).fit()
arima_pred = model_arima.predict(start=len(train_series), end=len(train_series)+len(eval_years)-1)
arima_rmse, arima_mape, arima_r2 = calculate_metrics(actual, arima_pred)
metrics_summary.append({
"Country": country, "Target": target, "Model": "ARIMA",
"RMSE": arima_rmse, "MAPE": arima_mape, "R²": arima_r2
})
except:
pass
# --- Prophet ---
try:
prophet_df = df_country[df_country['Year'].between(1950, 2020)][['Year', target]].rename(columns={'Year': 'ds', target: 'y'})
prophet_df['ds'] = pd.to_datetime(prophet_df['ds'], format='%Y')
model_prophet = Prophet()
model_prophet.fit(prophet_df)
future_eval = pd.DataFrame({'ds': pd.to_datetime(eval_years, format='%Y')})
prophet_pred = model_prophet.predict(future_eval)['yhat'].values
prophet_rmse, prophet_mape, prophet_r2 = calculate_metrics(actual, prophet_pred)
metrics_summary.append({
"Country": country, "Target": target, "Model": "Prophet",
"RMSE": prophet_rmse, "MAPE": prophet_mape, "R²": prophet_r2
})
except:
pass
# --- Random Forest ---
try:
features = selected_features_dict.get(target, [])
available = [f for f in features if f in df_country.columns]
X = df_country[available]
y = df_country[target]
X_train = X[df_country['Year'].between(1950, 2020)]
y_train = y[df_country['Year'].between(1950, 2020)]
X_eval = X[df_country['Year'].isin(eval_years)]
model_rf = RandomForestRegressor(n_estimators=100, random_state=42)
model_rf.fit(X_train, y_train)
rf_pred = model_rf.predict(X_eval)
rf_rmse, rf_mape, rf_r2 = calculate_metrics(actual, rf_pred)
# ✅ Add this block
eval_rows = pd.DataFrame({
"Country": [country] * len(eval_years),
"Target": [target] * len(eval_years),
"Year": eval_years,
"Prediction": rf_pred,
"Actual": actual
})
eval_results.append(eval_rows)
metrics_summary.append({
"Country": country, "Target": target, "Model": "Random Forest",
"RMSE": rf_rmse, "MAPE": rf_mape, "R²": rf_r2
})
except:
pass
df_eval_pred = pd.concat(eval_results, ignore_index=True)
def pick_best_model(group):
return group.loc[group['RMSE'].idxmin(), 'Model']
# Convert to DataFrame
df_metrics = pd.DataFrame(metrics_summary)
# Sort it and assign it to df_metrics_sorted
df_metrics_sorted = df_metrics.sort_values(['Country', 'Target', 'Model']).reset_index(drop=True)
# Best model picker function
def pick_best_model(group):
return group.loc[group['RMSE'].idxmin(), 'Model']
# Assign Best_Model using groupby and transform
df_metrics_sorted['Best_Model'] = df_metrics_sorted.groupby(['Country', 'Target'])['RMSE'].transform(
lambda x: df_metrics_sorted.loc[x.idxmin(), 'Model']
)
# Display full table
print("\n🎯 Step 20: Evaluation Summary with Best Model\n")
print(df_metrics_sorted[['Country', 'Target', 'Model', 'RMSE', 'MAPE', 'R²', 'Best_Model']].to_string(index=False))
# Export summary
df_metrics_sorted.to_csv("df_metrics_sorted.csv", index=False)
# Download to your computer
from google.colab import files
files.download("df_metrics_sorted.csv")
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/eqjjlcvx.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/z2w7pcm_.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=59368', 'data', 'file=/tmp/tmprjkocm4m/eqjjlcvx.json', 'init=/tmp/tmprjkocm4m/z2w7pcm_.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelxyjgjdq5/prophet_model-20250723142739.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:39 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:40 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vg5i6vka.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/lmka780h.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=75109', 'data', 'file=/tmp/tmprjkocm4m/vg5i6vka.json', 'init=/tmp/tmprjkocm4m/lmka780h.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelh7h91zgo/prophet_model-20250723142741.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:41 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:41 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gmsvuhmu.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0y0ma29h.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=19632', 'data', 'file=/tmp/tmprjkocm4m/gmsvuhmu.json', 'init=/tmp/tmprjkocm4m/0y0ma29h.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelkw7mltvl/prophet_model-20250723142742.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:42 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:43 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/smiy06t0.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/riq4u0nn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=36038', 'data', 'file=/tmp/tmprjkocm4m/smiy06t0.json', 'init=/tmp/tmprjkocm4m/riq4u0nn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelczq1p405/prophet_model-20250723142743.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:43 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:44 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/t008m8yn.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6dwjfhiq.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=99765', 'data', 'file=/tmp/tmprjkocm4m/t008m8yn.json', 'init=/tmp/tmprjkocm4m/6dwjfhiq.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelr_kg546m/prophet_model-20250723142744.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:44 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:45 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/oadqeb0v.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/pk4m4u71.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=12168', 'data', 'file=/tmp/tmprjkocm4m/oadqeb0v.json', 'init=/tmp/tmprjkocm4m/pk4m4u71.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelq7u60i3_/prophet_model-20250723142746.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:46 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:46 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/h40__qsi.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/n32vskud.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=57212', 'data', 'file=/tmp/tmprjkocm4m/h40__qsi.json', 'init=/tmp/tmprjkocm4m/n32vskud.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelz3ogozpi/prophet_model-20250723142747.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:47 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:48 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gscmz__t.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ik9qkw_0.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=25384', 'data', 'file=/tmp/tmprjkocm4m/gscmz__t.json', 'init=/tmp/tmprjkocm4m/ik9qkw_0.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeldacp3582/prophet_model-20250723142749.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:49 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:49 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/7hpoi5td.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/zn3uklyn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=93924', 'data', 'file=/tmp/tmprjkocm4m/7hpoi5td.json', 'init=/tmp/tmprjkocm4m/zn3uklyn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model0j5cee4e/prophet_model-20250723142749.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:50 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:50 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/8psco2ov.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tkhr35wn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=84665', 'data', 'file=/tmp/tmprjkocm4m/8psco2ov.json', 'init=/tmp/tmprjkocm4m/tkhr35wn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modell9z44mqp/prophet_model-20250723142750.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:50 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:51 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/spgmtiay.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/io0cpspk.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=3082', 'data', 'file=/tmp/tmprjkocm4m/spgmtiay.json', 'init=/tmp/tmprjkocm4m/io0cpspk.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelvtbox0en/prophet_model-20250723142751.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:51 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:51 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/eo_te0ht.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vc7l2mzf.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=60614', 'data', 'file=/tmp/tmprjkocm4m/eo_te0ht.json', 'init=/tmp/tmprjkocm4m/vc7l2mzf.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelchrkwy9s/prophet_model-20250723142752.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:52 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:52 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/pnk604ek.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/g732zgm_.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=35287', 'data', 'file=/tmp/tmprjkocm4m/pnk604ek.json', 'init=/tmp/tmprjkocm4m/g732zgm_.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeli6h6u0iz/prophet_model-20250723142752.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:52 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:53 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/b7vblsr6.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/fr2_eiu4.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=35398', 'data', 'file=/tmp/tmprjkocm4m/b7vblsr6.json', 'init=/tmp/tmprjkocm4m/fr2_eiu4.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelmhphwbs7/prophet_model-20250723142755.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:55 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:55 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/r9g89p_f.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ijnrdngf.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=54075', 'data', 'file=/tmp/tmprjkocm4m/r9g89p_f.json', 'init=/tmp/tmprjkocm4m/ijnrdngf.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelnxglfmc3/prophet_model-20250723142756.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:56 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:56 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ygbj5tne.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/l9fgdivj.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=54945', 'data', 'file=/tmp/tmprjkocm4m/ygbj5tne.json', 'init=/tmp/tmprjkocm4m/l9fgdivj.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelrv7d2fb0/prophet_model-20250723142756.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:56 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:57 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9460n9e5.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/08bkdmuo.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=1255', 'data', 'file=/tmp/tmprjkocm4m/9460n9e5.json', 'init=/tmp/tmprjkocm4m/08bkdmuo.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model_8o8l24m/prophet_model-20250723142757.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:57 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:57 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/8pdq6ykg.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/28lfdde0.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=1282', 'data', 'file=/tmp/tmprjkocm4m/8pdq6ykg.json', 'init=/tmp/tmprjkocm4m/28lfdde0.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelhlgecqdp/prophet_model-20250723142758.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:58 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:58 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tdpf1_54.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0dwo8sfe.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=38022', 'data', 'file=/tmp/tmprjkocm4m/tdpf1_54.json', 'init=/tmp/tmprjkocm4m/0dwo8sfe.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelzxjdv7s4/prophet_model-20250723142758.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:58 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:59 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/1p2t8ykc.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vrwn2mde.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=28166', 'data', 'file=/tmp/tmprjkocm4m/1p2t8ykc.json', 'init=/tmp/tmprjkocm4m/vrwn2mde.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeluzp444cl/prophet_model-20250723142759.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:27:59 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:27:59 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/njpwbnzq.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/kt8aa3fc.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=58212', 'data', 'file=/tmp/tmprjkocm4m/njpwbnzq.json', 'init=/tmp/tmprjkocm4m/kt8aa3fc.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model92bmcut4/prophet_model-20250723142800.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:00 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:00 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6_vkip_j.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/o8grblo9.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=4955', 'data', 'file=/tmp/tmprjkocm4m/6_vkip_j.json', 'init=/tmp/tmprjkocm4m/o8grblo9.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model0i3bmw6j/prophet_model-20250723142800.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:00 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:01 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/lhv_eq_n.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/w9tg63fy.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=13290', 'data', 'file=/tmp/tmprjkocm4m/lhv_eq_n.json', 'init=/tmp/tmprjkocm4m/w9tg63fy.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model1kzgl0y3/prophet_model-20250723142801.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:01 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:01 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/isyciumg.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/e5lx5x4_.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=7456', 'data', 'file=/tmp/tmprjkocm4m/isyciumg.json', 'init=/tmp/tmprjkocm4m/e5lx5x4_.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeliabjxlt5/prophet_model-20250723142802.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:02 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:02 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gfse9eex.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/p41bc0hj.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=18278', 'data', 'file=/tmp/tmprjkocm4m/gfse9eex.json', 'init=/tmp/tmprjkocm4m/p41bc0hj.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelasbnq6yo/prophet_model-20250723142802.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:02 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:03 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vxqrg_hg.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/8m_ipt_q.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=62453', 'data', 'file=/tmp/tmprjkocm4m/vxqrg_hg.json', 'init=/tmp/tmprjkocm4m/8m_ipt_q.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelentrb2vk/prophet_model-20250723142803.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:03 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:04 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/twg124ht.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/w5quimbo.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=76939', 'data', 'file=/tmp/tmprjkocm4m/twg124ht.json', 'init=/tmp/tmprjkocm4m/w5quimbo.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeljgyb62fy/prophet_model-20250723142804.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:04 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:04 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/t62344g0.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tisx2xkx.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=79164', 'data', 'file=/tmp/tmprjkocm4m/t62344g0.json', 'init=/tmp/tmprjkocm4m/tisx2xkx.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model39mmf_st/prophet_model-20250723142804.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:04 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:05 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/b1h9unmc.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/hgcljw8a.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=47767', 'data', 'file=/tmp/tmprjkocm4m/b1h9unmc.json', 'init=/tmp/tmprjkocm4m/hgcljw8a.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelfey3cwl9/prophet_model-20250723142805.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:05 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:05 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/adkcdrh2.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/evt78x3n.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=17956', 'data', 'file=/tmp/tmprjkocm4m/adkcdrh2.json', 'init=/tmp/tmprjkocm4m/evt78x3n.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelpg39k48t/prophet_model-20250723142806.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:06 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:06 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
🎯 Step 20: Evaluation Summary with Best Model
Country Target Model RMSE MAPE R² Best_Model
Bangladesh Cardiovascular diseases ARIMA 1.1756 4.02 -1.094927e+29 ARIMA
Bangladesh Cardiovascular diseases Prophet 6.9912 24.69 -3.872468e+30 ARIMA
Bangladesh Cardiovascular diseases Random Forest 4.9245 17.42 -1.921333e+30 ARIMA
Bangladesh Diabetes ARIMA 0.0000 0.00 0.000000e+00 ARIMA
Bangladesh Diabetes Prophet 2.9878 30.49 0.000000e+00 ARIMA
Bangladesh Diabetes Random Forest 0.1017 0.81 0.000000e+00 ARIMA
Bangladesh Life expectancy ARIMA 2.3127 2.76 -1.102500e+00 Prophet
Bangladesh Life expectancy Prophet 1.6767 1.89 -1.051000e-01 Prophet
Bangladesh Life expectancy Random Forest 2.2987 2.94 -1.077100e+00 Prophet
Brazil Cardiovascular diseases ARIMA 1.8195 4.66 0.000000e+00 ARIMA
Brazil Cardiovascular diseases Prophet 6.5472 16.73 0.000000e+00 ARIMA
Brazil Cardiovascular diseases Random Forest 3.5130 9.02 0.000000e+00 ARIMA
Brazil Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Brazil Diabetes Prophet 0.1860 2.14 0.000000e+00 ARIMA
Brazil Diabetes Random Forest 0.0457 0.55 0.000000e+00 ARIMA
Brazil Life expectancy ARIMA 3.0096 3.29 -5.672800e+00 Random Forest
Brazil Life expectancy Prophet 2.1896 2.66 -2.531900e+00 Random Forest
Brazil Life expectancy Random Forest 1.2862 1.52 -2.188000e-01 Random Forest
Germany Cardiovascular diseases ARIMA 0.4339 1.23 0.000000e+00 ARIMA
Germany Cardiovascular diseases Prophet 2.1255 5.82 0.000000e+00 ARIMA
Germany Cardiovascular diseases Random Forest 0.9503 2.69 0.000000e+00 ARIMA
Germany Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Germany Diabetes Prophet 2.7582 55.13 0.000000e+00 ARIMA
Germany Diabetes Random Forest 0.0000 0.00 1.000000e+00 ARIMA
Germany Life expectancy ARIMA 0.4746 0.44 -1.051900e+00 Random Forest
Germany Life expectancy Prophet 0.6124 0.65 -2.417000e+00 Random Forest
Germany Life expectancy Random Forest 0.3367 0.38 -3.260000e-02 Random Forest
India Cardiovascular diseases ARIMA 19.6630 6.68 0.000000e+00 ARIMA
India Cardiovascular diseases Prophet 37.4210 12.75 0.000000e+00 ARIMA
India Cardiovascular diseases Random Forest 47.5512 16.61 0.000000e+00 ARIMA
India Diabetes ARIMA 0.0197 0.21 0.000000e+00 Random Forest
India Diabetes Prophet 0.8306 9.49 0.000000e+00 Random Forest
India Diabetes Random Forest 0.0017 0.01 0.000000e+00 Random Forest
India Life expectancy ARIMA 1.9737 2.25 1.628000e-01 ARIMA
India Life expectancy Prophet 2.4758 2.42 -3.173000e-01 ARIMA
India Life expectancy Random Forest 2.1906 2.96 -3.130000e-02 ARIMA
Indonesia Cardiovascular diseases ARIMA 8.4866 11.75 0.000000e+00 Random Forest
Indonesia Cardiovascular diseases Prophet 7.9981 9.90 0.000000e+00 Random Forest
Indonesia Cardiovascular diseases Random Forest 0.0971 0.13 0.000000e+00 Random Forest
Indonesia Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Indonesia Diabetes Prophet 0.7121 9.24 0.000000e+00 ARIMA
Indonesia Diabetes Random Forest 0.0035 0.03 0.000000e+00 ARIMA
Indonesia Life expectancy ARIMA 1.8872 2.68 -2.444000e-01 Random Forest
Indonesia Life expectancy Prophet 1.6929 1.48 -1.400000e-03 Random Forest
Indonesia Life expectancy Random Forest 1.6442 2.28 5.540000e-02 Random Forest
Japan Cardiovascular diseases ARIMA 1.5477 3.73 0.000000e+00 ARIMA
Japan Cardiovascular diseases Prophet 7.6884 18.56 0.000000e+00 ARIMA
Japan Cardiovascular diseases Random Forest 4.2376 10.27 0.000000e+00 ARIMA
Japan Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Japan Diabetes Prophet 1.8411 27.47 0.000000e+00 ARIMA
Japan Diabetes Random Forest 0.0162 0.14 0.000000e+00 ARIMA
Japan Life expectancy ARIMA 0.6387 0.68 -4.204900e+00 Random Forest
Japan Life expectancy Prophet 0.5765 0.59 -3.239500e+00 Random Forest
Japan Life expectancy Random Forest 0.3200 0.37 -3.061000e-01 Random Forest
Kenya Cardiovascular diseases ARIMA 0.1218 3.48 -7.516462e+28 ARIMA
Kenya Cardiovascular diseases Prophet 0.9335 26.66 -4.418335e+30 ARIMA
Kenya Cardiovascular diseases Random Forest 0.7993 22.85 -3.239297e+30 ARIMA
Kenya Diabetes ARIMA 0.0004 0.01 0.000000e+00 ARIMA
Kenya Diabetes Prophet 3.4797 57.98 0.000000e+00 ARIMA
Kenya Diabetes Random Forest 0.0052 0.08 0.000000e+00 ARIMA
Kenya Life expectancy ARIMA 3.2353 4.35 -7.360000e+00 Random Forest
Kenya Life expectancy Prophet 1.6706 2.25 -1.228900e+00 Random Forest
Kenya Life expectancy Random Forest 1.2934 1.95 -3.360000e-01 Random Forest
Mexico Cardiovascular diseases ARIMA 0.5788 2.17 -2.654270e+28 ARIMA
Mexico Cardiovascular diseases Prophet 0.8437 3.08 -5.639601e+28 ARIMA
Mexico Cardiovascular diseases Random Forest 6.2764 28.37 -3.121092e+30 ARIMA
Mexico Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Mexico Diabetes Prophet 0.7997 7.13 -2.026747e+29 ARIMA
Mexico Diabetes Random Forest 0.4129 3.01 -5.403202e+28 ARIMA
Mexico Life expectancy ARIMA 6.2245 7.05 -6.367500e+00 Prophet
Mexico Life expectancy Prophet 2.4286 2.54 -1.216000e-01 Prophet
Mexico Life expectancy Random Forest 2.4902 3.34 -1.791000e-01 Prophet
Nigeria Cardiovascular diseases ARIMA 0.7164 3.98 0.000000e+00 ARIMA
Nigeria Cardiovascular diseases Prophet 4.4984 24.97 0.000000e+00 ARIMA
Nigeria Cardiovascular diseases Random Forest 3.6177 20.11 0.000000e+00 ARIMA
Nigeria Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Nigeria Diabetes Prophet 0.1408 2.06 0.000000e+00 ARIMA
Nigeria Diabetes Random Forest 0.0027 0.04 0.000000e+00 ARIMA
Nigeria Life expectancy ARIMA 0.7003 1.17 -1.846100e+00 Prophet
Nigeria Life expectancy Prophet 0.3693 0.58 2.086000e-01 Prophet
Nigeria Life expectancy Random Forest 1.2444 2.14 -7.985900e+00 Prophet
United States Cardiovascular diseases ARIMA 1.1904 1.29 0.000000e+00 ARIMA
United States Cardiovascular diseases Prophet 11.9749 12.93 0.000000e+00 ARIMA
United States Cardiovascular diseases Random Forest 10.0919 10.96 0.000000e+00 ARIMA
United States Diabetes ARIMA 0.0080 0.10 0.000000e+00 Random Forest
United States Diabetes Prophet 0.4896 6.65 0.000000e+00 Random Forest
United States Diabetes Random Forest 0.0040 0.05 0.000000e+00 Random Forest
United States Life expectancy ARIMA 1.9969 2.10 -1.796800e+00 Random Forest
United States Life expectancy Prophet 1.5614 1.63 -7.100000e-01 Random Forest
United States Life expectancy Random Forest 1.2177 1.39 -3.990000e-02 Random Forest
# Evaluation metrics (RMSE, MAPE, R²)
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
def calculate_metrics(actual, predicted):
rmse = np.sqrt(mean_squared_error(actual, predicted))
mae = mean_absolute_error(actual, predicted)
r2 = r2_score(actual, predicted)
mape = np.mean(np.abs((actual - predicted) / actual)) * 100
return round(rmse, 4), round(mape, 2), round(r2, 4)
metrics_summary = []
# Evaluation years
eval_years = [2021, 2022, 2023]
for country in selected_countries:
df_country = df_forecast_ready[df_forecast_ready['Country'] == country]
for target in target_columns:
if target not in df_country.columns:
continue
actual = df_country[df_country['Year'].isin(eval_years)][target].values
# --- ARIMA ---
try:
train_series = df_country[df_country['Year'].between(1950, 2020)][[target]]
train_series.index = pd.date_range(start='1950', periods=len(train_series), freq='YE')
model_arima = ARIMA(train_series, order=(1, 1, 1)).fit()
arima_pred = model_arima.predict(start=len(train_series), end=len(train_series)+len(eval_years)-1)
arima_rmse, arima_mape, arima_r2 = calculate_metrics(actual, arima_pred)
metrics_summary.append({
"Country": country, "Target": target, "Model": "ARIMA",
"RMSE": arima_rmse, "MAPE": arima_mape, "R²": arima_r2
})
except:
pass
# --- Prophet ---
try:
prophet_df = df_country[df_country['Year'].between(1950, 2020)][['Year', target]].rename(columns={'Year': 'ds', target: 'y'})
prophet_df['ds'] = pd.to_datetime(prophet_df['ds'], format='%Y')
model_prophet = Prophet()
model_prophet.fit(prophet_df)
future_eval = pd.DataFrame({'ds': pd.to_datetime(eval_years, format='%Y')})
prophet_pred = model_prophet.predict(future_eval)['yhat'].values
prophet_rmse, prophet_mape, prophet_r2 = calculate_metrics(actual, prophet_pred)
metrics_summary.append({
"Country": country, "Target": target, "Model": "Prophet",
"RMSE": prophet_rmse, "MAPE": prophet_mape, "R²": prophet_r2
})
except:
pass
# --- Random Forest ---
try:
features = selected_features_dict.get(target, [])
available = [f for f in features if f in df_country.columns]
X = df_country[available]
y = df_country[target]
X_train = X[df_country['Year'].between(1950, 2020)]
y_train = y[df_country['Year'].between(1950, 2020)]
X_eval = X[df_country['Year'].isin(eval_years)]
model_rf = RandomForestRegressor(n_estimators=100, random_state=42)
model_rf.fit(X_train, y_train)
rf_pred = model_rf.predict(X_eval)
rf_rmse, rf_mape, rf_r2 = calculate_metrics(actual, rf_pred)
metrics_summary.append({
"Country": country, "Target": target, "Model": "Random Forest",
"RMSE": rf_rmse, "MAPE": rf_mape, "R²": rf_r2
})
except:
pass
def pick_best_model(group):
return group.loc[group['RMSE'].idxmin(), 'Model']
# Convert to DataFrame
df_metrics = pd.DataFrame(metrics_summary)
# Sort it and assign it to df_metrics_sorted
df_metrics_sorted = df_metrics.sort_values(['Country', 'Target', 'Model']).reset_index(drop=True)
# Best model picker function
def pick_best_model(group):
return group.loc[group['RMSE'].idxmin(), 'Model']
# Assign Best_Model using groupby and transform
df_metrics_sorted['Best_Model'] = df_metrics_sorted.groupby(['Country', 'Target'])['RMSE'].transform(
lambda x: df_metrics_sorted.loc[x.idxmin(), 'Model']
)
# Display full table
print("\n🎯 Step 20: Evaluation Summary with Best Model\n")
print(df_metrics_sorted[['Country', 'Target', 'Model', 'RMSE', 'MAPE', 'R²', 'Best_Model']].to_string(index=False))
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/96agqxb4.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/a63jie00.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=58230', 'data', 'file=/tmp/tmprjkocm4m/96agqxb4.json', 'init=/tmp/tmprjkocm4m/a63jie00.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelrbxqmd3k/prophet_model-20250723142811.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:11 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:11 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/f7yn6mcf.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vekzyhes.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=5483', 'data', 'file=/tmp/tmprjkocm4m/f7yn6mcf.json', 'init=/tmp/tmprjkocm4m/vekzyhes.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelutxoc49d/prophet_model-20250723142812.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:12 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:12 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/56jfux6d.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/nxnbae66.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=42415', 'data', 'file=/tmp/tmprjkocm4m/56jfux6d.json', 'init=/tmp/tmprjkocm4m/nxnbae66.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model78j27m6s/prophet_model-20250723142812.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:12 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9ph1dk0k.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/to_q09k2.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=80779', 'data', 'file=/tmp/tmprjkocm4m/9ph1dk0k.json', 'init=/tmp/tmprjkocm4m/to_q09k2.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model3wcrj73h/prophet_model-20250723142813.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:13 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/8lcv5y6z.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/kjesknro.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=15638', 'data', 'file=/tmp/tmprjkocm4m/8lcv5y6z.json', 'init=/tmp/tmprjkocm4m/kjesknro.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelcgb49lbt/prophet_model-20250723142814.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:14 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:14 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/75qvkpfx.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/i2i8_56b.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=7959', 'data', 'file=/tmp/tmprjkocm4m/75qvkpfx.json', 'init=/tmp/tmprjkocm4m/i2i8_56b.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelp1rbzqxl/prophet_model-20250723142815.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:15 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:16 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/35m11l02.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/2cf1fb8e.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=45962', 'data', 'file=/tmp/tmprjkocm4m/35m11l02.json', 'init=/tmp/tmprjkocm4m/2cf1fb8e.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelpjdmugoc/prophet_model-20250723142816.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:16 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:17 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6b03wzex.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/r6bgqq_5.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=36355', 'data', 'file=/tmp/tmprjkocm4m/6b03wzex.json', 'init=/tmp/tmprjkocm4m/r6bgqq_5.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model01dxs0d2/prophet_model-20250723142818.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:18 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:19 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/h_0isjxz.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4666mwro.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=50895', 'data', 'file=/tmp/tmprjkocm4m/h_0isjxz.json', 'init=/tmp/tmprjkocm4m/4666mwro.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelujssn5hl/prophet_model-20250723142820.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:20 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:21 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/q5094zuy.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/d21_bqlg.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=42067', 'data', 'file=/tmp/tmprjkocm4m/q5094zuy.json', 'init=/tmp/tmprjkocm4m/d21_bqlg.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelavpi2gqq/prophet_model-20250723142822.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:22 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:22 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/t5p7euuw.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4lqozamx.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=47422', 'data', 'file=/tmp/tmprjkocm4m/t5p7euuw.json', 'init=/tmp/tmprjkocm4m/4lqozamx.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelygoqwes7/prophet_model-20250723142823.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:23 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:24 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tzl8hmsg.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/aqj2c0s9.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=54955', 'data', 'file=/tmp/tmprjkocm4m/tzl8hmsg.json', 'init=/tmp/tmprjkocm4m/aqj2c0s9.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model34c4yai4/prophet_model-20250723142824.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:24 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:25 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/1jdjv1ym.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/5ply91ys.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=16233', 'data', 'file=/tmp/tmprjkocm4m/1jdjv1ym.json', 'init=/tmp/tmprjkocm4m/5ply91ys.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeln8xdotuh/prophet_model-20250723142825.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:25 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:26 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tg6xi57o.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/em7pt54s.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=82535', 'data', 'file=/tmp/tmprjkocm4m/tg6xi57o.json', 'init=/tmp/tmprjkocm4m/em7pt54s.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modellbr_d7wm/prophet_model-20250723142827.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:27 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:28 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/56s2q0ui.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gy08iedo.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=60748', 'data', 'file=/tmp/tmprjkocm4m/56s2q0ui.json', 'init=/tmp/tmprjkocm4m/gy08iedo.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelg7jfvios/prophet_model-20250723142828.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:28 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:28 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/mu2y1iux.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/v7jgd4r2.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=9337', 'data', 'file=/tmp/tmprjkocm4m/mu2y1iux.json', 'init=/tmp/tmprjkocm4m/v7jgd4r2.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelku5nak1k/prophet_model-20250723142829.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:29 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:29 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jzvyvewp.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/mr48mr14.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=23976', 'data', 'file=/tmp/tmprjkocm4m/jzvyvewp.json', 'init=/tmp/tmprjkocm4m/mr48mr14.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model3lpo3ui7/prophet_model-20250723142829.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:29 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:30 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ofu8h322.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/k8n5hxd9.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=70178', 'data', 'file=/tmp/tmprjkocm4m/ofu8h322.json', 'init=/tmp/tmprjkocm4m/k8n5hxd9.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelxh05y0r4/prophet_model-20250723142830.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:30 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:30 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/g2xiz9hi.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/t5g5emio.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=22075', 'data', 'file=/tmp/tmprjkocm4m/g2xiz9hi.json', 'init=/tmp/tmprjkocm4m/t5g5emio.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model8cduqjex/prophet_model-20250723142831.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:31 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:31 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/m0rwqk4l.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/xnpt149d.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=77601', 'data', 'file=/tmp/tmprjkocm4m/m0rwqk4l.json', 'init=/tmp/tmprjkocm4m/xnpt149d.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeldubr4w1q/prophet_model-20250723142831.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:31 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:32 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/e2it4bm5.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jh2zfokd.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=61228', 'data', 'file=/tmp/tmprjkocm4m/e2it4bm5.json', 'init=/tmp/tmprjkocm4m/jh2zfokd.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelhr6l8_bh/prophet_model-20250723142833.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:33 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:33 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/lnzx2fbb.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ledcbnmk.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=22655', 'data', 'file=/tmp/tmprjkocm4m/lnzx2fbb.json', 'init=/tmp/tmprjkocm4m/ledcbnmk.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeld9g72k_0/prophet_model-20250723142834.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:34 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:35 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/abygc97f.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3vxzg627.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=41273', 'data', 'file=/tmp/tmprjkocm4m/abygc97f.json', 'init=/tmp/tmprjkocm4m/3vxzg627.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelpq_jed9k/prophet_model-20250723142835.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:35 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:35 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/pwjgxwnc.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/qvr8h8ur.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=51176', 'data', 'file=/tmp/tmprjkocm4m/pwjgxwnc.json', 'init=/tmp/tmprjkocm4m/qvr8h8ur.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelukr7_ca3/prophet_model-20250723142835.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:35 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:36 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0e4wunt5.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/x3p3rsx4.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=22674', 'data', 'file=/tmp/tmprjkocm4m/0e4wunt5.json', 'init=/tmp/tmprjkocm4m/x3p3rsx4.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelzbgkl07q/prophet_model-20250723142836.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:36 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:37 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/lbc7c6mv.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/msjjsc5w.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=91260', 'data', 'file=/tmp/tmprjkocm4m/lbc7c6mv.json', 'init=/tmp/tmprjkocm4m/msjjsc5w.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model17vi9bbj/prophet_model-20250723142837.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:37 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:37 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/qjasgdr_.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/u1fq__1x.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=81508', 'data', 'file=/tmp/tmprjkocm4m/qjasgdr_.json', 'init=/tmp/tmprjkocm4m/u1fq__1x.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model5d4_6ltf/prophet_model-20250723142838.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:38 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:38 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/0zuanv8u.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/j0tr_zru.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=72026', 'data', 'file=/tmp/tmprjkocm4m/0zuanv8u.json', 'init=/tmp/tmprjkocm4m/j0tr_zru.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model4fjx7_g4/prophet_model-20250723142838.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:38 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:39 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/m8l9oi1q.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/fosm3gdg.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=75488', 'data', 'file=/tmp/tmprjkocm4m/m8l9oi1q.json', 'init=/tmp/tmprjkocm4m/fosm3gdg.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelit1ljs_j/prophet_model-20250723142839.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:39 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:39 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/vdfsimm5.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/559dtkca.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=38508', 'data', 'file=/tmp/tmprjkocm4m/vdfsimm5.json', 'init=/tmp/tmprjkocm4m/559dtkca.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model2fwcy8wp/prophet_model-20250723142839.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 14:28:39 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 14:28:40 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing
🎯 Step 20: Evaluation Summary with Best Model
Country Target Model RMSE MAPE R² Best_Model
Bangladesh Cardiovascular diseases ARIMA 1.1756 4.02 -1.094927e+29 ARIMA
Bangladesh Cardiovascular diseases Prophet 6.9912 24.69 -3.872468e+30 ARIMA
Bangladesh Cardiovascular diseases Random Forest 4.9245 17.42 -1.921333e+30 ARIMA
Bangladesh Diabetes ARIMA 0.0000 0.00 0.000000e+00 ARIMA
Bangladesh Diabetes Prophet 2.9878 30.49 0.000000e+00 ARIMA
Bangladesh Diabetes Random Forest 0.1017 0.81 0.000000e+00 ARIMA
Bangladesh Life expectancy ARIMA 2.3127 2.76 -1.102500e+00 Prophet
Bangladesh Life expectancy Prophet 1.6767 1.89 -1.051000e-01 Prophet
Bangladesh Life expectancy Random Forest 2.2987 2.94 -1.077100e+00 Prophet
Brazil Cardiovascular diseases ARIMA 1.8195 4.66 0.000000e+00 ARIMA
Brazil Cardiovascular diseases Prophet 6.5472 16.73 0.000000e+00 ARIMA
Brazil Cardiovascular diseases Random Forest 3.5130 9.02 0.000000e+00 ARIMA
Brazil Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Brazil Diabetes Prophet 0.1860 2.14 0.000000e+00 ARIMA
Brazil Diabetes Random Forest 0.0457 0.55 0.000000e+00 ARIMA
Brazil Life expectancy ARIMA 3.0096 3.29 -5.672800e+00 Random Forest
Brazil Life expectancy Prophet 2.1896 2.66 -2.531900e+00 Random Forest
Brazil Life expectancy Random Forest 1.2862 1.52 -2.188000e-01 Random Forest
Germany Cardiovascular diseases ARIMA 0.4339 1.23 0.000000e+00 ARIMA
Germany Cardiovascular diseases Prophet 2.1255 5.82 0.000000e+00 ARIMA
Germany Cardiovascular diseases Random Forest 0.9503 2.69 0.000000e+00 ARIMA
Germany Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Germany Diabetes Prophet 2.7582 55.13 0.000000e+00 ARIMA
Germany Diabetes Random Forest 0.0000 0.00 1.000000e+00 ARIMA
Germany Life expectancy ARIMA 0.4746 0.44 -1.051900e+00 Random Forest
Germany Life expectancy Prophet 0.6124 0.65 -2.417000e+00 Random Forest
Germany Life expectancy Random Forest 0.3367 0.38 -3.260000e-02 Random Forest
India Cardiovascular diseases ARIMA 19.6630 6.68 0.000000e+00 ARIMA
India Cardiovascular diseases Prophet 37.4210 12.75 0.000000e+00 ARIMA
India Cardiovascular diseases Random Forest 47.5512 16.61 0.000000e+00 ARIMA
India Diabetes ARIMA 0.0197 0.21 0.000000e+00 Random Forest
India Diabetes Prophet 0.8306 9.49 0.000000e+00 Random Forest
India Diabetes Random Forest 0.0017 0.01 0.000000e+00 Random Forest
India Life expectancy ARIMA 1.9737 2.25 1.628000e-01 ARIMA
India Life expectancy Prophet 2.4758 2.42 -3.173000e-01 ARIMA
India Life expectancy Random Forest 2.1906 2.96 -3.130000e-02 ARIMA
Indonesia Cardiovascular diseases ARIMA 8.4866 11.75 0.000000e+00 Random Forest
Indonesia Cardiovascular diseases Prophet 7.9981 9.90 0.000000e+00 Random Forest
Indonesia Cardiovascular diseases Random Forest 0.0971 0.13 0.000000e+00 Random Forest
Indonesia Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Indonesia Diabetes Prophet 0.7121 9.24 0.000000e+00 ARIMA
Indonesia Diabetes Random Forest 0.0035 0.03 0.000000e+00 ARIMA
Indonesia Life expectancy ARIMA 1.8872 2.68 -2.444000e-01 Random Forest
Indonesia Life expectancy Prophet 1.6929 1.48 -1.400000e-03 Random Forest
Indonesia Life expectancy Random Forest 1.6442 2.28 5.540000e-02 Random Forest
Japan Cardiovascular diseases ARIMA 1.5477 3.73 0.000000e+00 ARIMA
Japan Cardiovascular diseases Prophet 7.6884 18.56 0.000000e+00 ARIMA
Japan Cardiovascular diseases Random Forest 4.2376 10.27 0.000000e+00 ARIMA
Japan Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Japan Diabetes Prophet 1.8411 27.47 0.000000e+00 ARIMA
Japan Diabetes Random Forest 0.0162 0.14 0.000000e+00 ARIMA
Japan Life expectancy ARIMA 0.6387 0.68 -4.204900e+00 Random Forest
Japan Life expectancy Prophet 0.5765 0.59 -3.239500e+00 Random Forest
Japan Life expectancy Random Forest 0.3200 0.37 -3.061000e-01 Random Forest
Kenya Cardiovascular diseases ARIMA 0.1218 3.48 -7.516462e+28 ARIMA
Kenya Cardiovascular diseases Prophet 0.9335 26.66 -4.418335e+30 ARIMA
Kenya Cardiovascular diseases Random Forest 0.7993 22.85 -3.239297e+30 ARIMA
Kenya Diabetes ARIMA 0.0004 0.01 0.000000e+00 ARIMA
Kenya Diabetes Prophet 3.4797 57.98 0.000000e+00 ARIMA
Kenya Diabetes Random Forest 0.0052 0.08 0.000000e+00 ARIMA
Kenya Life expectancy ARIMA 3.2353 4.35 -7.360000e+00 Random Forest
Kenya Life expectancy Prophet 1.6706 2.25 -1.228900e+00 Random Forest
Kenya Life expectancy Random Forest 1.2934 1.95 -3.360000e-01 Random Forest
Mexico Cardiovascular diseases ARIMA 0.5788 2.17 -2.654270e+28 ARIMA
Mexico Cardiovascular diseases Prophet 0.8437 3.08 -5.639601e+28 ARIMA
Mexico Cardiovascular diseases Random Forest 6.2764 28.37 -3.121092e+30 ARIMA
Mexico Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Mexico Diabetes Prophet 0.7997 7.13 -2.026747e+29 ARIMA
Mexico Diabetes Random Forest 0.4129 3.01 -5.403202e+28 ARIMA
Mexico Life expectancy ARIMA 6.2245 7.05 -6.367500e+00 Prophet
Mexico Life expectancy Prophet 2.4286 2.54 -1.216000e-01 Prophet
Mexico Life expectancy Random Forest 2.4902 3.34 -1.791000e-01 Prophet
Nigeria Cardiovascular diseases ARIMA 0.7164 3.98 0.000000e+00 ARIMA
Nigeria Cardiovascular diseases Prophet 4.4984 24.97 0.000000e+00 ARIMA
Nigeria Cardiovascular diseases Random Forest 3.6177 20.11 0.000000e+00 ARIMA
Nigeria Diabetes ARIMA 0.0000 0.00 1.000000e+00 ARIMA
Nigeria Diabetes Prophet 0.1408 2.06 0.000000e+00 ARIMA
Nigeria Diabetes Random Forest 0.0027 0.04 0.000000e+00 ARIMA
Nigeria Life expectancy ARIMA 0.7003 1.17 -1.846100e+00 Prophet
Nigeria Life expectancy Prophet 0.3693 0.58 2.086000e-01 Prophet
Nigeria Life expectancy Random Forest 1.2444 2.14 -7.985900e+00 Prophet
United States Cardiovascular diseases ARIMA 1.1904 1.29 0.000000e+00 ARIMA
United States Cardiovascular diseases Prophet 11.9749 12.93 0.000000e+00 ARIMA
United States Cardiovascular diseases Random Forest 10.0919 10.96 0.000000e+00 ARIMA
United States Diabetes ARIMA 0.0080 0.10 0.000000e+00 Random Forest
United States Diabetes Prophet 0.4896 6.65 0.000000e+00 Random Forest
United States Diabetes Random Forest 0.0040 0.05 0.000000e+00 Random Forest
United States Life expectancy ARIMA 1.9969 2.10 -1.796800e+00 Random Forest
United States Life expectancy Prophet 1.5614 1.63 -7.100000e-01 Random Forest
United States Life expectancy Random Forest 1.2177 1.39 -3.990000e-02 Random Forest
# Plot Acutal vs Predict (RF, ARIMA, Prophet )
# Plots Testing- Actual vs Predict (RF, ARIMA, Prophet) - 18 July
import matplotlib.pyplot as plt
import numpy as np
def plot_target_forecast(df, country, target):
# Filter data for country & target
df_ct = df[(df['Country'] == country) & (df['Target'] == target)].sort_values('Year')
# Extract data
years = df_ct['Year']
arima = df_ct['ARIMA_Forecast']
rf = df_ct['RF_Forecast']
prophet = df_ct['Prophet_Forecast']
# Actual years
actual_years = [2021, 2022, 2023]
forecast_years = list(range(2024, 2075))
actual_mask = df_ct['Year'].isin(actual_years)
# Use ARIMA prediction as proxy for observed if needed
actual_vals = arima[actual_mask]
# Start plot
plt.figure(figsize=(13, 6))
# Forecast region shading
plt.axvspan(2024, 2074, color='gray', alpha=0.12, label='Forecast Horizon')
# Plot forecasts
plt.plot(years, arima, color='forestgreen', linewidth=2, label='ARIMA Forecast')
plt.plot(years, prophet, color='darkorchid', linewidth=2, label='Prophet Forecast')
plt.plot(years, rf, color='navy', linewidth=2, label='Random Forest Forecast')
# Plot actual values
plt.scatter(df_ct.loc[actual_mask, 'Year'], actual_vals,
color='orange', edgecolor='black', s=90,
label='Observed (2021–2023)', zorder=5)
# Final touches
plt.title(f"{target} Forecast — {country}", fontsize=16)
plt.xlabel("Year")
plt.ylabel("Value")
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()
selected_countries = [
'United States', 'Germany', 'Japan', 'Brazil', 'India',
'Indonesia', 'Nigeria', 'Kenya', 'Mexico', 'Bangladesh'
]
for country in selected_countries:
for target in ["Life expectancy", "Diabetes", "Cardiovascular diseases"]:
plot_target_forecast(df_model_comparison, country, target)
# Plots Testing- Actual vs Predict (RF, ARIMA, Prophet) - 18 July - TESTING 2
forecast_summary = []
for country in selected_countries:
df_country = df_forecast_ready[df_forecast_ready['Country'] == country].sort_values('Year')
for target in target_columns:
if target not in df_country.columns:
continue
features = selected_features_dict.get(target, [])
available_features = [f for f in features if f in df_country.columns]
if not available_features:
continue
df_train = df_country[df_country['Year'].between(start_train, end_train)]
df_eval = df_country[df_country['Year'].isin(eval_years)]
df_forecast = df_country[df_country['Year'].isin(forecast_horizon)]
actual_eval = df_eval[target].values
#### ARIMA ####
arima_rmse, arima_forecast_eval, arima_forecast = None, [], []
try:
train_series = df_train[[target]].copy()
train_series.index = pd.date_range(start='1950', periods=len(train_series), freq='YE')
model = ARIMA(train_series, order=(1, 1, 1)).fit()
pred_eval_arima = model.predict(start=len(train_series), end=len(train_series)+len(df_eval)-1)
arima_rmse = np.sqrt(mean_squared_error(actual_eval, pred_eval_arima))
arima_forecast_eval = pred_eval_arima.tolist()
arima_forecast = model.predict(start=len(train_series)+len(df_eval),
end=len(train_series)+len(df_eval)+len(df_forecast)-1).tolist()
except:
pass
#### Prophet ####
prophet_rmse, prophet_forecast_eval, prophet_forecast = None, [], []
try:
prophet_df = df_train[['Year', target]].rename(columns={'Year': 'ds', target: 'y'})
prophet_df['ds'] = pd.to_datetime(prophet_df['ds'], format='%Y')
model = Prophet()
model.fit(prophet_df)
eval_dates = pd.DataFrame({'ds': pd.to_datetime(eval_years, format='%Y')})
forecast_eval_prophet = model.predict(eval_dates)
prophet_rmse = np.sqrt(mean_squared_error(actual_eval, forecast_eval_prophet['yhat'].values))
prophet_forecast_eval = forecast_eval_prophet['yhat'].tolist()
forecast_years_df = pd.DataFrame({'ds': pd.to_datetime(df_forecast['Year'], format='%Y')})
prophet_forecast = model.predict(forecast_years_df)['yhat'].tolist()
except:
pass
#### RF ####
rf_rmse, rf_forecast_eval, rf_forecast = None, [], []
try:
X = df_country[available_features]
y = df_country[target]
X_train = X[df_country['Year'].between(start_train, end_train)]
y_train = y[df_country['Year'].between(start_train, end_train)]
X_eval = X[df_country['Year'].isin(eval_years)]
y_eval = y[df_country['Year'].isin(eval_years)]
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
pred_eval_rf = model.predict(X_eval)
rf_rmse = np.sqrt(mean_squared_error(y_eval, pred_eval_rf))
rf_forecast_eval = pred_eval_rf.tolist()
X_forecast = X[df_country['Year'].isin(forecast_horizon)]
rf_forecast = model.predict(X_forecast).tolist() if not X_forecast.isnull().any(axis=1).any() else [None]*len(X_forecast)
except:
pass
# Append evaluation predictions
for i, year in enumerate(eval_years):
forecast_summary.append({
"Country": country,
"Target": target,
"Year": year,
"ARIMA_RMSE": arima_rmse,
"ARIMA_Forecast": arima_forecast_eval[i] if i < len(arima_forecast_eval) else None,
"Prophet_RMSE": prophet_rmse,
"Prophet_Forecast": prophet_forecast_eval[i] if i < len(prophet_forecast_eval) else None,
"RF_RMSE": rf_rmse,
"RF_Forecast": rf_forecast_eval[i] if i < len(rf_forecast_eval) else None
})
# Append future forecast predictions
for i, year in enumerate(df_forecast['Year']):
forecast_summary.append({
"Country": country,
"Target": target,
"Year": year,
"ARIMA_RMSE": arima_rmse,
"ARIMA_Forecast": arima_forecast[i] if i < len(arima_forecast) else None,
"Prophet_RMSE": prophet_rmse,
"Prophet_Forecast": prophet_forecast[i] if i < len(prophet_forecast) else None,
"RF_RMSE": rf_rmse,
"RF_Forecast": rf_forecast[i] if i < len(rf_forecast) else None
})
df_model_comparison = pd.DataFrame(forecast_summary).sort_values(["Country", "Target", "Year"])
def plot_target_forecast(df_model_all, df_eval_ready, country, target):
eval_years = [2021, 2022, 2023]
forecast_years = list(range(2024, 2075))
full_years = eval_years + forecast_years
df_actual = df_eval_ready[
(df_eval_ready['Country'] == country) &
(df_eval_ready['Year'].isin(eval_years))
][['Year', target]].sort_values('Year')
df_plot = df_model_all[
(df_model_all['Country'] == country) &
(df_model_all['Target'] == target) &
(df_model_all['Year'].isin(full_years))
].sort_values('Year')
years = df_plot['Year'].values
rf_vals = df_plot['RF_Forecast'].values
arima_vals = df_plot['ARIMA_Forecast'].values
prophet_vals = df_plot['Prophet_Forecast'].values
# Build actual line
actual_line = []
for yr in years:
val = df_actual[df_actual['Year'] == yr][target]
actual_line.append(val.values[0] if not val.empty else np.nan)
# Plot
plt.figure(figsize=(13, 6))
plt.axvspan(2024, 2074, color='gray', alpha=0.12, label='Forecast Horizon')
plt.plot(years, actual_line, label="🟧 Actual", color='orange', linewidth=2)
plt.plot(years, rf_vals, label="🔵 RF Prediction", color='dodgerblue', linewidth=2)
plt.plot(years, arima_vals, label="🟩 ARIMA Prediction", color='forestgreen', linewidth=2)
plt.plot(years, prophet_vals, label="🟣 Prophet Prediction", color='darkorchid', linewidth=2)
plt.title(f"{target} — Actual & Forecast Comparison ({country})", fontsize=16)
plt.xlabel("Year")
plt.ylabel("Value")
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()
selected_countries = [
'United States', 'Germany', 'Japan', 'Brazil', 'India',
'Indonesia', 'Nigeria', 'Kenya', 'Mexico', 'Bangladesh'
]
selected_targets = ["Life expectancy", "Diabetes", "Cardiovascular diseases"]
for country in selected_countries:
for target in selected_targets:
plot_target_forecast(df_model_comparison, df_forecast_ready, country, target)
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/xcn15p9f.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/fq7s2wc4.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=56323', 'data', 'file=/tmp/tmprjkocm4m/xcn15p9f.json', 'init=/tmp/tmprjkocm4m/fq7s2wc4.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelv0_il2me/prophet_model-20250723150106.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:06 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:08 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3i0cknr4.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/n0ji9b3r.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=90929', 'data', 'file=/tmp/tmprjkocm4m/3i0cknr4.json', 'init=/tmp/tmprjkocm4m/n0ji9b3r.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelg7_3e6i4/prophet_model-20250723150110.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:10 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:10 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/94b6i4ty.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/wiku0dsr.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=97008', 'data', 'file=/tmp/tmprjkocm4m/94b6i4ty.json', 'init=/tmp/tmprjkocm4m/wiku0dsr.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelh4lwn385/prophet_model-20250723150111.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:11 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:12 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/qg7p85e3.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4sctqmvy.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=44341', 'data', 'file=/tmp/tmprjkocm4m/qg7p85e3.json', 'init=/tmp/tmprjkocm4m/4sctqmvy.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelw54y1lrb/prophet_model-20250723150112.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:12 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jhcw5fgj.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gee4nwbv.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=89937', 'data', 'file=/tmp/tmprjkocm4m/jhcw5fgj.json', 'init=/tmp/tmprjkocm4m/gee4nwbv.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelpy6sfrua/prophet_model-20250723150113.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:13 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:13 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/hya61vxx.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gqd80ecs.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=95356', 'data', 'file=/tmp/tmprjkocm4m/hya61vxx.json', 'init=/tmp/tmprjkocm4m/gqd80ecs.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeltqfulq0n/prophet_model-20250723150114.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:14 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:14 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ko5xb_ld.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/41l5k7jf.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=6632', 'data', 'file=/tmp/tmprjkocm4m/ko5xb_ld.json', 'init=/tmp/tmprjkocm4m/41l5k7jf.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelwh4bbhju/prophet_model-20250723150115.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:15 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:15 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/brryefbv.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/79kzkg23.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=45981', 'data', 'file=/tmp/tmprjkocm4m/brryefbv.json', 'init=/tmp/tmprjkocm4m/79kzkg23.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model8a83zuam/prophet_model-20250723150115.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:15 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:16 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/1vtpzvl4.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/zo3hc0yn.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=27575', 'data', 'file=/tmp/tmprjkocm4m/1vtpzvl4.json', 'init=/tmp/tmprjkocm4m/zo3hc0yn.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelyvq_dlwd/prophet_model-20250723150116.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:16 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:16 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/e7rkf5ty.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/iqbldoj4.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=39930', 'data', 'file=/tmp/tmprjkocm4m/e7rkf5ty.json', 'init=/tmp/tmprjkocm4m/iqbldoj4.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model2c3lh1fk/prophet_model-20250723150117.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:17 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:17 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/yq_o5a0j.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/q586xxt7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=49176', 'data', 'file=/tmp/tmprjkocm4m/yq_o5a0j.json', 'init=/tmp/tmprjkocm4m/q586xxt7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model47zxobpr/prophet_model-20250723150117.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:17 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:18 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4a97gqcs.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/b2zn8__y.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=25471', 'data', 'file=/tmp/tmprjkocm4m/4a97gqcs.json', 'init=/tmp/tmprjkocm4m/b2zn8__y.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model_33hc8lw/prophet_model-20250723150118.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:18 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:18 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/buq0wfdv.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/en47nn7i.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=33930', 'data', 'file=/tmp/tmprjkocm4m/buq0wfdv.json', 'init=/tmp/tmprjkocm4m/en47nn7i.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model05mkl102/prophet_model-20250723150119.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:19 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:19 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/6k7ywp4j.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/s5jjzobg.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=4964', 'data', 'file=/tmp/tmprjkocm4m/6k7ywp4j.json', 'init=/tmp/tmprjkocm4m/s5jjzobg.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model_qx7pexn/prophet_model-20250723150120.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:20 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:20 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4tykcuex.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4uz180rf.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=19771', 'data', 'file=/tmp/tmprjkocm4m/4tykcuex.json', 'init=/tmp/tmprjkocm4m/4uz180rf.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modeleq3wjnzq/prophet_model-20250723150121.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:21 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:21 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/dub9k5jc.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/a_8fsjbb.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=27947', 'data', 'file=/tmp/tmprjkocm4m/dub9k5jc.json', 'init=/tmp/tmprjkocm4m/a_8fsjbb.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model8cz227c3/prophet_model-20250723150122.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:22 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:23 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/e2sg94bz.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ma6e3h6o.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=59076', 'data', 'file=/tmp/tmprjkocm4m/e2sg94bz.json', 'init=/tmp/tmprjkocm4m/ma6e3h6o.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modellx89hbtk/prophet_model-20250723150123.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:23 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:24 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/2foc1vzw.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3p1hzrs8.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=25301', 'data', 'file=/tmp/tmprjkocm4m/2foc1vzw.json', 'init=/tmp/tmprjkocm4m/3p1hzrs8.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelnfinlbmz/prophet_model-20250723150124.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:24 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:24 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4w6q0v9n.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ttzzynt3.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=40853', 'data', 'file=/tmp/tmprjkocm4m/4w6q0v9n.json', 'init=/tmp/tmprjkocm4m/ttzzynt3.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model5g2l9frw/prophet_model-20250723150125.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:25 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:25 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/q6sitoku.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jxnz0p82.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=45329', 'data', 'file=/tmp/tmprjkocm4m/q6sitoku.json', 'init=/tmp/tmprjkocm4m/jxnz0p82.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelbomqvztx/prophet_model-20250723150125.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:25 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:26 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/sxj4mx1i.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/j3ff4el7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=55135', 'data', 'file=/tmp/tmprjkocm4m/sxj4mx1i.json', 'init=/tmp/tmprjkocm4m/j3ff4el7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model0a2p2g90/prophet_model-20250723150126.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:26 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:26 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/gpjbd4kf.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/4be0jlhe.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=64708', 'data', 'file=/tmp/tmprjkocm4m/gpjbd4kf.json', 'init=/tmp/tmprjkocm4m/4be0jlhe.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelj_13ox_d/prophet_model-20250723150127.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:27 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:27 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/tyabzpsb.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/dsimj0a7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=75251', 'data', 'file=/tmp/tmprjkocm4m/tyabzpsb.json', 'init=/tmp/tmprjkocm4m/dsimj0a7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model416aph0l/prophet_model-20250723150128.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:28 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:28 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/8eok89ec.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ngwetz9c.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=13514', 'data', 'file=/tmp/tmprjkocm4m/8eok89ec.json', 'init=/tmp/tmprjkocm4m/ngwetz9c.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model6ra7svlu/prophet_model-20250723150128.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:28 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:29 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ctysfzxc.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/py2f0mie.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=79130', 'data', 'file=/tmp/tmprjkocm4m/ctysfzxc.json', 'init=/tmp/tmprjkocm4m/py2f0mie.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelfzuldb6m/prophet_model-20250723150129.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:29 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:30 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/sorq_sl7.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/3cpdd1hk.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=48566', 'data', 'file=/tmp/tmprjkocm4m/sorq_sl7.json', 'init=/tmp/tmprjkocm4m/3cpdd1hk.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model5qngdowd/prophet_model-20250723150130.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:30 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:30 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ts10rp61.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/9xnd_9dl.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=2376', 'data', 'file=/tmp/tmprjkocm4m/ts10rp61.json', 'init=/tmp/tmprjkocm4m/9xnd_9dl.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_model4cm1tcdf/prophet_model-20250723150131.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:31 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:31 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/ki_scwy0.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/pvgasxyf.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=23559', 'data', 'file=/tmp/tmprjkocm4m/ki_scwy0.json', 'init=/tmp/tmprjkocm4m/pvgasxyf.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelzzp4uuj8/prophet_model-20250723150131.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:31 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:32 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/jg311jj0.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/2bsjz2e7.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=9497', 'data', 'file=/tmp/tmprjkocm4m/jg311jj0.json', 'init=/tmp/tmprjkocm4m/2bsjz2e7.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelqgx6y8ti/prophet_model-20250723150132.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:32 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:32 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this. DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/v5h3o2i9.json DEBUG:cmdstanpy:input tempfile: /tmp/tmprjkocm4m/mpabi3zd.json DEBUG:cmdstanpy:idx 0 DEBUG:cmdstanpy:running CmdStan, num_threads: None DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.11/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=27575', 'data', 'file=/tmp/tmprjkocm4m/v5h3o2i9.json', 'init=/tmp/tmprjkocm4m/mpabi3zd.json', 'output', 'file=/tmp/tmprjkocm4m/prophet_modelj5nlrp04/prophet_model-20250723150133.csv', 'method=optimize', 'algorithm=newton', 'iter=10000'] 15:01:33 - cmdstanpy - INFO - Chain [1] start processing INFO:cmdstanpy:Chain [1] start processing 15:01:33 - cmdstanpy - INFO - Chain [1] done processing INFO:cmdstanpy:Chain [1] done processing